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Foreword 



A Special Volume Dedicated to Biswa Nath Datta 

The Indian Institute of Technology-Kharagpur (IIT-KGP) hosted an international 
workshop on "Numerical Linear Algebra in Signal, Systems, and Control", during 
January 9-11, 2007. The conference was sponsored by IEEE Kharagpur Section, 
Department of Electrical Engineering of IIT-KGP, and Systems Society of India- 
IIT Kharagpur Chapter. The convener of the workshop was Professor Aurobinda 
Routray of Electrical Engineering Department of IIT-KGP. 

The workshop was interdisciplinary in nature blending linear and numerical 
linear algebra with control and systems theory, and signal processing. Though a 
few such conferences, such as the AMS conference on "Linear Algebra and its 
Role in Systems Theory", and a series of four SIAM conferences on "Linear 
Algebra in Signals, Systems, and Control", were held before in USA, this is the 
first time an interdisciplinary conference of this type was held in India. About one 
hundred mathematicians, computational scientists, and engineers from several 
countries of the world, including Australia, Belgium, Brazil, Cyprus-Turkey, 
Germany, Hong Kong, India, Sri Lanka, USA, and Venezuela, participated in this 
workshop. The picture below shows the group of attendees. 
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At the banquet of this workshop, Professor Biswa Nath Datta was honored by 
the IEEE for his numerous contributions in the area of numerical linear algebra 
with control and systems theory, and signal processing. The ceremony was pre- 
sided by Professor N. Kishore, the-then President of the IEEE Kharagpur Chapter 
and Profressor Rajendra Bhatia, an eminent India mathematician from Indian 
Statistical Institute, was the principal banquet speaker. The other speakers were: 
the late Gene Golub, Professor of Stanford University, and Professors Paul Van 
Dooren of Universife Catholique de Louvain, Belgium, Volker Mehrmann of 
Technische Universitat Berlin, Germany and V. K. Mohan of Indian Institute of 
Technology, Kharagpur, India. It was also decided at the end of this meeting to 
dedicate a special volume to Biswa Datta. This volume contains papers presented 
at the workshop, as well as contributed and invited papers sent in by authors 
working in this area. All of these papers went through a regular refereeing process. 




As editors of this volume we are pleased to convey our best wishes to Biswa. 

Shankar Bhattacharyya, Raymond Chan, Vadim Olshevsky, Aurobinda Routray 
and Paul Van Dooren. 

Biosketch of Biswa Nath Datta 

Biswa Datta was born in the village of Bighira, West Bengal, India in 1941. After 
completing his high school education from his village school and bachelor and 
master degrees from Calcutta University in India, he moved to London, England to 
pursue his higher studies in 1967. Primarily due to economic reasons, he could not 
complete his studies in London and in 1968, moved to Hamilton, Ontario, Canada 
from where he completed his masters degree in mathematics in 1970. 

Based on his Masters Thesis, written under Joseph Csima, he published his first 
paper, DAD Theorem for Nonnegative Symmetric Matrices in Journal of Combi- 
natorial Theory in 1972. 
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He then joined University of Ottawa, Canada and received his Ph.D. degree in 
1972, under the direction of the late James H. Howland. 

Immediately after his graduation from University of Ottawa, he got married to 
Karabi Datta, who at that time was also a fellow graduate student in mathematics 
at University of Ottawa and working under the direction of the late Professor 
Howland. 

Later that year, he met with Air Vice Marshal S. Roychoudhury, the-then 
Director of Gas Turbine Research Establishment (GTRE), Bangalore, India, who 
was visiting North America at that time with the mission of recruiting "Indian- 
talents" from abroad. Mr. Roychoudhury offered both Biswa and Karabi the 
positions as scientific officers at GTRE. Additionally, Datta was appointed as the 
Head of the Computational Mathematics Group in that organization. They both 
enthusiastically went back to India in 1973 with their new positions. Unfortu- 
nately, this non-academic government job was not stimulating and challenging 
enough for Datta and he decided to leave India again. After spending a few months 
at Ahmadu Bello University, Zaria, Nigeria, as a lecturer (1974) and about four 
years (1975-1980) at State University of Campinas, Campinas, Brazil as an 
Associate Professor of Computational Mathematics, he finally moved back to 
North America in 1980 as a visiting Associate Professor of Computer Science at 
Pennsylvania State University. 

Their two children, Rajarshi and Rakhi, both were born in Campinas, in 1976 and 
1979, respectively. Biswa and Karabi now have two grandsons: Jayen(4) and 
Shaan(2) — thier parents are Rajarshi and Swati. Datta joined Northern Illinois 
University in 198 1 as a Full Professor in Mathematical Sciences Department. He was 
nominated by Hans Schneider and several other prominent mathematicians for this 
position. There he, along with some of his colleagues, developed the current Com- 
putational Mathematics Program at NIU. He also played a key role in the planning 
and development of the Ph.D. Program in the Mathematical Sciences Department. 
He also served as the acting director of the "Applications Involvement Component 
(AlC)" of the Ph.D. Program in mathematical sciences from 1993 to 1996. 

In 2001, he was appointed as a Presidential Research Professor at NIU and was 
elevated to the position of "Distinguished Research Professor" in 2005, which he 
currently holds. Datta has also held Visiting Professorship at University of Illinois 
(1985) and University of California, San Diego (1987-1988) and numerous short- 
term Visiting Professorship (including several Distinguished Visiting Professor- 
ship) at institutes and research organizations in various countries around the world. 
These include Australia, Brazil, Chile, China, England, France, Hong Kong, Greece, 
India, Mexico, Malaysia, Portugal, Spain, Taiwan, and Venezuela. 

Though Datta spent most of his academic career outside India, he maintained a 
close scientific relationship with India. In 1993, he was appointed as a member of 
overseas panel of scientists for the Council of Scientific and Industrial Research, 
Government of India. During 1993-2003, he made many short-term scientific 
visits to research laboratories and prominent institutes in India as a scientific 
advisor. He also contributed to the development of the scientific software for 
India's first supercomputer "Param". 
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In recognition of his contributions to science and engineering, Datta has 
received several awards and honors. He was honored in a special IEEE sponsored 
honoring ceremony during the banquet of the International Workshop on 
Numerical Linear Algebra in Signals, Systems and Control, IIT-Kharagpur, 2007, 
and more recently, in another honoring ceremony on the occasion of the First 
International Conference on Power, Control, Signals and Computations, held at 
the Vidya Academy of Science and Technology, Thrissur, India, 2010, where he 
was awarded a Gold Medal of Honor. Datta was also recognized by some of the 
world's leading linear and numerical linear algebraists in a special banquet 
honoring ceremony held during the IMA sponsored International Conference on 
Linear and Numerical Linear Algebra: Theory, Methods and Applications, held at 
Northern Illinois University, DeKalb, Illinois, August 12, 2009. Besides these 
professional recognitions, Datta has also received several other important 
professional awards and honors. He is an IEEE Distinguished Lecturer, an IEEE 
Fellow, an Academician of the Academy of Nonlinear Sciences, and is a recipient 
of NIU's Presidential Research Professorship, International Federation of 
Nonlinear Analysis Medal of Honor, Senior Fulbright Specialist award by US 
State Department and several Plaques of Honor awarded by local IEEE chapters 
of IIT-Kharagpur, India, and NIU. A special issue on "Inverse Problems in Sci- 
ence and Industry " of the Journal Numerical Linear Algebra with Applications 
will be published in his honor in 20 1 1 . 

Datta has versatile research interests ranging from theoretical and computa- 
tional linear algebra to control and systems theory and vibration engineering. 

• Contributions to inertia, stability, and D- stability 

In the early part of his research career, he worked on the stability, inertia, and 
D-stability of matrices. He collaborated with several leading matrix theorists, 
including David Carlson, Paul Fuhrmann, Charles Johnson and Hans Schneider 
on this endeavor. In a series of papers published between 1976 and 1980, he 
developed an unified matrix theoretic framework, via the historical Lypunov 
stability theory, for many apparently diverse classical root-separation results for 
polynomials, which were obtained by celebrated mathematicians in the early 
twentieth century. In the process of doing so, he derived simple and elementary 
matrix theoretic proofs of several of these classical root- separation criteria (e.g., 
proofs of the Routh-Hurwitz-Fujiwara root-separation criterion and stability 
criterion of Lienard-Chipart via Bezoutian). Most of the original proofs were 
based on function-theory and were too involved and long. A key result proved 
by him in this context, was that the Bezoutian of two polynomials is a sym- 
metrizer of its associated companion matrix. 

In 1979, in a paper published in Numerische Mathematik, he and David 
Carlson, developed a numerical algorithm for computing the inertia and stability 
of a non-hermitian matrix and in 1980, along with David Carlson, and Charles 
Johnson, developed a new characterization of D-stability for symmetric tridi- 
agonal matrices. The concept of D-stability arising in economics was originally 
formulated by Nobel Laureate, Kenneth Arrow. An effective characterization of 
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D-stability still does not exist. Notably, their result is still one of the state-of-the- 
art results on this problem in the literature. 

• Contributions to Computational Control Theory 

In the last two decades, Datta and several other prominent researchers, including 
Paul Van Dooren, Alan Laub, Rajni Patel, Andras Varga, Volker Mehrmann, 
and Peter Benner, and others have developed computationally effective algo- 
rithms for control systems design and analysis using state-of-the-art numerical 
linear algebra techniques. While there existed a high-level mathematical theory, 
practically applicable algorithms were lacking in this area. These and other 
previously existing algorithms have been the basis of one of Datta' s books. 
Numerical Methods for Linear Control Systems Design and Analysis and two 
software packages, MATHEMATICA based Control Systems Professional- 
Advanced Numerical Methods and MATLAB Toolkit MATCONTROL. Datta has 
delivered several invited workshops on Computer-aided Control Systems 
Design and Analysis based on this book and the software packages, at some of 
the leading IEEE and other conferences and at universities and research orga- 
nizations around the world. His work on "Large-Scale Computations in Con- 
trol" has been in the forefront of attempts made by him and several others to 
develop parallel and high-performance algorithms for control systems design. 
His paper Large-scale and Parallel Computations in Control, LAA, 1989, is one 
of the early research papers in this area. A few other important papers authored/ 
co-authored by him in this area include, Arnoldi methods for large Sylvester-like 
observer matrix equations and an associated algorithm for partial spectrum 
assignment (with Youcef Saad), LAA (1991), A parallel algorithm for Sylves- 
ter-observer equation (with Chris Bischof and A. Purkayastha), SIAM Journal 
of Scientific Computing (1996), Parallel algorithms for certain matrix compu- 
tations (with B. Codenotti and M. Leoncini), Theoretical Computer Science 
(1997), Parallel Algorithms in Control, Proc. IEEE Conf. Decision and Control 
(1991), and High performance computing in control, SIAM book on Parallel 
Processing for Scientific Computing (1993). While parallel and high perfor- 
mance algorithms were developed in many disciplines in science and engi- 
neering, control engineering was lagging behind. 

• Contributions to Quadratic Inverse Eigenvalue Problems 

The quadratic inverse eigenvalue problem (QIEP) is an emerging topic of 
research. Because of intrinsic mathematical difficulties and high computational 
complexities, research on QIEP has not been very well developed. Datta, in 
collaboration with several vibration engineers, numerical linear algebraists, and 
optimization specialists, including Z.-J. Bai, Moody Chu, Eric Chu, Sien Deng, 
Sylvan Elhay, Abhijit Gupta, Wen-Wei Lin, Yitshak Ram, Marcos Raydan, 
Kumar V. Singh, Jesse Prat, Jenn Nan Wang, C.S. Wang, and some of his 
former and current students, Sanjoy Brahma, Joao Carvalho, Joali Moreno, 
Daniil Sarkissian, and Vadim Sokolov, have made some significant contribu- 
tions to the development of numerically effective and practically applicable 
algorithms and associated mathematical theory for several important QIEPs 
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arising in active vibration control and finite element model updating. A new 
orthogonality relation for the quadratic matrix pencil derived in the much cited 
paper Orthogonality and partial pole assignment for the symmetric definite 
quadratic pencil (with S. Elhay and Y. Ram), LAA (1997) has played a key role 
in these works. 

Effective and practically applicable solutions to these problems pose some 
mathematically challenging and computationally difficult issues. The two major 
constraints are: (i) the problems must be solved using only a small number of 
eigenvalues and eigenvectors of the associated quadratic pencil which are 
computable using the state-of-the-art computational techniques, and (ii) the no 
spill-over phenomenon (that is keeping the large number of unassigned eigen- 
values and eigenvectors of the original pencil unchanged) must be ascertained 
by means of mathematical theory. Furthermore, solutions of the robust and 
minimum-norm quadratic partial eigenvalue assignment and some practical 
aspects of model updating problem lead to difficult nonlinear (some cases 
nonconvex) optimization problems, which give rise to additional challenges for 
computing the required gradient formulas with only a small part of computable 
eigenvalues and eigenvectors. Most of the current industrial techniques are 
adhoc and lack strong mathematical foundations. Datta and his collaborators 
have adequately addressed some of these challenges in their work. One of the 
important thrusts of their work has been to develop a mathematical theory 
justifying some of the existing industrial techniques for which such a theory 
does not exist. These results have been published in some of the leading 
mathematics and vibration engineering journals, such as Journal of Sound and 
Vibration, Mechanical Systems and Signal Processing (MSSP), AIAA Journal. 
His recent joint paper (with Z.-J. Bai and J. Wang), Robust and minimum-norm 
partial eigenvalue assignment in vibrating systems, MSSP (2010) is a significant 
paper on the solution of robust and minimum-norm quadratic partial eigenvalue 
assignment arising in active vibrating control. 

Based on his current work on QlEP, Datta has delivered many plenary and 
key-note talks (and several more are to be delivered this year) at interdisci- 
plinary conferences blending mathematics, computational mathematics and 
optimization with vibration and control engineering. He has also served on the 
editorial board of more than a dozen of mathematics and engineering journals, 
including, SIAM J. Matrix Analysis, Lin. Alg. Appl. (Special Editor), Num. Lin. 
Alg. Appl., Computational and Applied Mathematics, Dynamics of Continuous, 
Discrete and Impulsive Systems, Mechanical Systems and Signal Processing 
(MSSP), Computational and Applied Mathematics (Brazil) and others. He also 
edited/co-edited several special issues of these journals, the most recent one is 
on "Inverse Problems in Mechanical Systems and Signal Processing" for the 
Journal, MSSP (with John Mottershead as a co-editor), published in 2009. 

For more than 25 years, Datta, through his research, books, and other academic 
activities, has been actively contributing to promote interdisciplinary research 
blending linear and numerical linear algebra with control, systems, and signal 
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processing, which has been a primary mission in his career. He has authored more 
than 110 research papers, two books, and three associated software packages, all of 
which are highly interdisciplinary. His book. Numerical Linear Algebra and 
Applications contains a wide variety of applications drawn from numerous dis- 
ciplines of science and engineering. His other book. Numerical Methods for Linear 
Control Systems Design and Analysis, describes how sophisticated numerical 
linear algebra techniques can be used to develop numerically reliable and com- 
putationally effective algorithms for linear control systems design and analysis. 
This book provides an interdisciplinary framework for studying linear control 
systems. The software packages, MATCOM and MATCONTROL are widely used 
for classroom instructions and Control Systems Professional- Advanced Numerical 
Methods is used for both industrial applications and classroom instructions. 

Datta took a leading role in organizing a series of interdisciplinary conferences. 
The first such conference, chaired by Datta, was the AMS Summer Research 
Conference on the Role of Linear Algebra in Signals, Systems, and Control in 
1984, which was participated by many leading researchers in linear and numerical 
algebra, control and system theory and signal processing. Remarkably, this was the 
first conference ever supported by the AMS in linear algebra. Subsequently, Datta, 
on invitation by Edward Block, the-then managing director of SIAM, chaired and 
organized the first SIAM Conference on Linear Algebra in Signals Systems, and 
Control in 1986. The huge success of this conference led to three more SIAM 
conferences in this series held, respectively, in San Francisco, Seattle and Boston, 
in 1990, 1993, and 2001, which were chaired or co-chaired by Datta. He also 
organized and co-chaired the interdisciplinary conference blending mathematics 
with systems theory. Mathematical Theory of Networks and Systems (MTNS) 
in 1996. He also served as a member of the international Steering Committee 
of MTNS. 

Datta has served as an editor of three interdisciplinary books that grew out of 
some of these conferences: The Role of Linear Algebra in Systems theory, AMS 
Contemporary Mathematics, volume 47, 1985; Linear Algebra in Signals, Systems 
and Control, SIAM, 1988; and Systems and Control in the Twenty-First Century, 
Birkhauser, 1997. He was also the editor-in-chief of the two books in the series: 
Applied and Computational Control, Signals, and Circuits, vol I published by 
Birkhauser in 1999, and vol II by Kluwer Academic Publisher, 2001. He also took 
an initiative in founding the well-known SIAM J. Matrix Analysis and Applica- 
tions, and served as one of the founding editors (with the late Gene Golub as the 
first Managing Editor) of this journal. He served as the Vice-Chair of the SIAM 
Linear Algebra Activity Group from 1993 to 1998, and chaired the award com- 
mittee of the Best SIAM Linear Algebra Papers in 1994 and 1997. He has also 
chaired or served as a member of several panels in linear algebra and control, 
including the NSF Panel on Future Directions of Research and Teaching in 
Mathematical Systems Theory, University of Notre Dame, 2002 of which he was 
the chair. 

So far, Datta has advised ten interdisciplinary Ph.D. Dissertations and numerous 
masters theses. During their graduate studies, these students acquired 
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interdisciplinary training by taking advanced courses from Datta on numerical 
aspects of control and vibration engineering, and working on interdisciplinary 
projects and dissertation topics. Almost all of these students picked up their dis- 
sertation while taking the advanced interdisciplinary courses from him. Such 
interdisciplinary expertise is in high demand in both academia and industries 
worldwide, but is hard to find. Indeed, several of Datta' s former students are now 
working as industrial mathematicians and researchers in research laboratories: 
Samar Choudhury at IBM, Vadim Sokolov at Argonne National Laboratory, Avijit 
Purkayastha at Texas Advanced Computing Center of University of Texas, Dan 7 
Pierce, formerly of the Boeing Company and now the CEO of Access Analytics 
International, and M. Lagadapati at Caterpillar. 
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Chapter 1 

The Anti-Reflective Transform 

and Regularization by Filtering 

A. Arico, M. Donatelli, J. Nagy and S. Serra-Capizzano 



Abstract Filtering methods are used in signal and image restoration to reconstruct 
an approximation of a signal or image from degraded measurements. Filtering 
methods rely on computing a singular value decomposition or a spectral factor- 
ization of a large structured matrix. The structure of the matrix depends in part on 
imposed boundary conditions. Anti-reflective boundary conditions preserve con- 
tinuity of the image and its (normal) derivative at the boundary, and have been 
shown to produce superior reconstructions compared to other commonly used 
boundary conditions, such as periodic, zero and reflective. The purpose of this 
paper is to analyze the eigenvector structure of matrices that enforce anti-reflective 
boundary conditions. In particular, a new anti-reflective transform is introduced, 
and an efficient approach to computing filtered solutions is proposed. Numerical 
tests illustrate the performance of the discussed methods. 
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2 A. Aiico et al. 

1.1 Introduction 

In this paper we consider structured matrices that arise from the discretization of 
large scale ill-posed inverse problems, 

g=A( + n. (1.1) 

Given the vector g and matrix A, the aim is to compute an approximation of the 
unknown vector f. The vector rj represents unknown errors (e.g., measurement or 
discretization errors and noise) in the observed data. These problems arise in many 
applications, including image reconstruction, image deblurring, geophysics, 
parameter identification and inverse scattering; cf. [2, 9, 11, 12, 19]. We are 
mainly interested in problems that arise in spatially invariant signal and image 
restoration, where the observed data is 

8i = ^fjhi-j + nn 

and the dimension li = 1 for signals (such as voice), and li = 2 or 3 for images. 
The (jf-dimensional tensor h = [h,] represents the blurring operator, and is called 
the point spread function (PSF). Notice that we have an infinite summation 
because a true signal or image scene does not have a finite boundary. However, the 
data gi is collected only at a finite number of values, and thus represents only a 
finite region of an infinite scene. Boundary conditions (BCs) are used to artificially 
describe the scene outside the viewable region. The PSF and the imposed BCs 
together define the matrix A. 

Typically the matrix A is very ill-conditioned and the degenerating subspace 
largely intersects the high frequency space: consequently regularization techniques 
are used to compute stable approximations of f with controlled noise levels [9-11, 
19]. Many choices of regularization can be employed, such as TSVD, Tikhonov, 
and total variation [9, 11, 19]. Analysis and implementation of regularization 
methods can often be simplified by computing a spectral (or singular value) 
decomposition of A. Unfortunately this may be very difficult for large scale 
problems, unless the matrix has exploitable structure. For example, if A is circulant 
then the spectral decomposition can be computed efficiently with the fast Fourier 
transform (FFT) [5]. In image deblurring, circulant structures arise when enforcing 
periodic boundary conditions. Periodic boundary conditions are convenient for 
computational reasons, but it is difficult to justify their use in a physical sense for 
most problems. 

Other boundary conditions, which better describe the scene outside the view- 
able region have been proposed. For example, reflective boundary conditions 
assume the scene outside the viewable region is a reflection of the scene inside the 
viewable region. In this case the matrix A has a Toeplitz-plus-Hankel structure. If 
the blur satisfies a strong symmetry condition, h; — h\i\ for all i G 7/, then the 
spectral decomposition of A can be computed very efficiently using the fast dis- 
crete cosine transform (DCT) [14]. 



1 The Anti-Reflective Transform and Regularization by Filtering 3 

More recently, new anti-reflective boundary conditions (AR-BCs) have been 
proposed [17] and studied [1, 7, 8, 15, 18], which have the advantage that con- 
tinuity of the image, and of the normal derivative, are preserved at the boundary. 
This regularity, which is not shared with zero or periodic BCs, and only partially 
shared with reflective BCs, significantly reduces ringing artifacts that may occur 
with other boundary conditions. The matrix structure arising from the imposition 
of AR-BCs is Toeplitz-plus-Hankel (as in the case of reflective BCs), plus an 
additional structured low rank matrix. By linearity of the boundary conditions with 
respect to the PSF, it is evident that the set of AR-BC matrices is a vector space. 
Unfortunately it is not closed under multiplication or inversion. However, if we 
restrict our attention to strongly symmetric PSFs and assume that the PSF satisfies 
a mild finite extent condition (more precisely /;,■ = if |/,| > n — 2, ; = (/i, . . ., /j) 
for some 7 G {1, . . .,<i}), then any of the resulting AR-BC matrices belong to a d- 
level commutative matrix algebra denoted by ATc ' , see [1]. Certain computations 
involving matrices in ATC ' can be done efficiently. For example, matrix-vector 
products can be implemented using FFTs, and solution of linear systems and 
eigenvalue computation involving these matrices can be done efficiently using 
mainly fast sine transforms (FSTs), see [1]. 

The new contribution of this paper is the analysis of the eigenvector structure of 
^7^' ' matrices. The main result concerns the definition of the AR-transform, 
which carries interesting functional information, is fast ((?(«'' log (n)) real opera- 
tions) and structured, but it is not orthogonal. Then we use the resulting eigen- 
vector structure to define filtering-based regularization methods to reconstruct 
approximate solutions of (1.1). 

The paper is organized as follows. In Sect. 1.2, we review the main features of 
the ATZ matrix algebra that are essential for describing and analyzing AR-BC 
matrices. In Sect. 1.3 we introduce AR-BCs and in Sect. 1.4 we discuss the 
spectral features of the involved matrices and define an AR-transform. In Sect. 1.5 
we describe an efficient approach to compute a filtered (regularized) solution of 
(1.1). In Sect. 1.6 some ID numerical results validate the theoretical analysis. A 
concise treatment of the multidimensional case is given in Sect. 1.7, together with 
a 2D numerical example. 

In the rest of the paper we consider essentially only the one-dimensional 
problem (that is, d = 1), to simplify the notation and mathematical analysis. 
However, the results generalize to higher dimensions, d > \; comments and results 
for extending the analysis are provided in Sect. 1.7. 



1.2 The Algebra of Matrices Induced by AR-BCs 

This section is devoted to describing the algebra ATZ„ = ATZ,n > 3. The notation 
introduced in this section will be used throughout the paper, and is essential for the 
description, given in Sect. 1.3, of the n x n matrices arising from the imposition of 
AR-BCs. 



4 A. Aiico et al. 

1.2.1 The X Algebra 

Let Q be the type I sine transform matrix of order n (see [3]) with entries 



It is known that the real matrix Q is orthogonal and symmetric ((2^^ = Q^ ~ Q)- 
For any n-dimensional real vector v, the matrix-vector multiplication Qy (DST-I 
transform) can be computed in 0(«log(n)) real operations by using the algorithm 
FST-I. 

Let T be the space of all the matrices that can be diagonalized by Q: 

T = {QDQ : Z) is a real diagonal matrix of size n}. (1-3) 

Let X ~ QDQ G t, then QX — DQ. Consequently, if we let ei denote the first 
column of the identity matrix, then the relationship QXei = DQei implies that the 
eigenvalues [£)],,. of X are given by [D],,. = [g(^ei)],./[eei],., ; = 1, . . .,n. 
Therefore the eigenvalues of X can be obtained by applying a DST-I transform to 
the first column of X and, in addition, any matrix in t is uniquely determined by its 
first column. 

Now we report a characterization of the x class, which is important for 
analyzing the structure of AR-BC matrices. Let us define the shift of any vector 
h = [fto, . . .,/i„-i] as cr(h) = [/!i,/z2, ...,/!„-!, 0] . According to a Matlab like 
notation, we define T{x) to be the n-hy-n symmetric Toeplitz matrix whose first 
column is x and H{x,y) to be the n-by-n Hankel matrix whose first and last 
column are x and y, respectively. Every matrix of the class (1.3) can be written 
as (see [3]) 

T{h)~H{a\h),Jcr\h)), (1.4) 

where h = [ho, . . ., /z„_i] G R" and 7 is a matrix with entries [J]^ , = 1 if .? + f = 
n + 1 and zero otherwise. We refer to 7 as a "flip" matrix because the multipli- 
cation Jx has the effect of flipping the entries of the vector x in an up/down 
direction. The structure defined by (1.4) means that matrices in the t class are 
special instances of Toeplitz-plus-Hankel matrices. 

Moreover, the eigenvalues of the t matrix in (1.4) are given by the cosine 
function h{y) evaluated at the grid points vector G„ = [^]^^i, where 

h{y)= Y. hjcxpiijy), (1.5) 



i = — 1, and hj = hy\ for [i\<n — 1. The t matrix in (1.4) is usually denoted by 
r{h) and is called the t matrix generated by the function or symbol h — h{-) (see 
the seminal paper [3] where this notation was originally proposed). 
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1.2.2 The AR-Algebras ATI 

Let h = h{-) be a real-valued cosine polynomial of degree / and let Tk{h) = 
Qdiag{h{Gk))Q (note that Tk{h) coincides with the matrix in (1.4-1.5), when 
l<k~ 1). Then the Fourier coefficients of h are such that /z, = /z_, G R with 
/z; = if |/| > /, and for A: = « — 2 we can define the one-level AR„{-) operator 



AR„{h) = 



h{0) 

'n-l{h) T„-l{h) J\n-l{h) 

m 



(1.6) 



where J is the flip matrix, \„-2{h) — T„-2{{<P{hj){-))^i and 



It is interesting to observe that h{y) — h{0) has a zero of order at least 2 at zero 
(since h is a cosine polynomial) and therefore (^(/j) = ((/)(/z))(-) is still a cosine 
polynomial of degree / — 1, whose value at zero is ~h"{Q)/2 (in other words the 
function is well defined at zero). 

As proved in [1], with the above definition of the operator AR„{-), we have 

1. aiAR„{hi) + l]AR„{h2) ^ AR„{ahi + ^hz), 

2. ARn{h)ARn{h2) ^ AR„{h,h2), 

for real a and fi and for cosine functions hi = hi{-) and ho = h2{-). 

These properties allow us to define ATZ as the algebra (closed under linear 
combinations, product and inversion) of matrices ATZ„{h), with h being a cosine 
polynomial. By standard interpolation arguments it is easy to see that ATZ can be 
defined as the set of matrices ARn{h), where his a cosine polynomial of degree at 
most n — 3. Therefore, it is clear that dm\{ATV) = « — 2. Moreover the algebra 
ATZ is commutative thanks to point 2, since hi{y)h2{y) = h2{y)h\{y) at every y. 
Consequently, if matrices of the form AR„{h) are diagonalizable, then they must 
have the same set of eigenvectors [13]. This means there must exist an "anti- 
reflective transform" that diagonalizes the matrices in ATZ. Unfortunately this 
transform fails to be unitary, since the matrices in ATZ are generically not normal. 
However the AR transform is close in rank to an orthogonal linear mapping. 
Development of this transform is the main contribution of this paper, and is 
discussed in detail in Sect. 1.4. 



1.3 AR-BCs and the AR-BC Matrices 

In this section we describe the anti-reflective BCs. We have already mentioned 
that, in the generic case, periodic and zero Dirichlet BCs introduce a discontinuity 
in the signal, while the reflective BCs preserve the continuity of the signal, but 
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introduce a discontinuity in the derivative. Our approach is to use an anti-reflec- 
tion: in this way, at the boundaries, instead of having a mirror-like symmetry 
(reflective BCs), we impose a global symmetry around the boundary points. This 
latter choice corresponds to a central symmetry around the considered boundary 
point. If/i is the left boundary point and/, is the right one, then the external points 
fi-j and f„+j,j > 1, are computed as a function of the internal points according to 
the rules /i_^ -/i = -(/■+! -/,) and f„+j -f„^ ~(fn-j -fn)- If the support of the 
centered blurring function is q = 2m + I < n, then for^ = 1, . . ., m, we have 

fl-j = 2/l ~fj+\, fn+j — Zfn -fn-j- 

Following the analysis given in [17], if the blurring function (the PSF) h is 
symmetric (i.e., /z, — /z_/, V; S Z), if /?,■ = for \i\>n — 2 (degree condition), and 
if h is normalized so that Y^'^=-m^< ~ ^' ^^^'^ ^^^ structure of the nx n anti- 
reflective blurring matrix A is 



A = 



Zl 







0^ 

A 
0^ 







Zl 

zo 



(1.8) 



where An = A„ 



1,2; = hi + 2 X)r=/+i f^k,A has order n — 2 and 



(1.9) 



with h = [h(), hi, . . .,hi„,0, . . .,0] . According to the brief discussion of Sect. 1.2.1, 
relation (1.9) implies that A = tn-iih) with h{y) = ho + 2^™^j /!^cos(/ry) (see 
(1.4) and (1.5)). Moreover in [1] it is proved that A — AR„{h). 

1.4 Eigenvalues and Eigenvectors of AR-BC Matrices 

In this section we first describe the spectrum of AR-BC matrices, under the usual 
mild degree condition (that is, the PSF h has finite support), with symmetric, 
normalized PSFs. Then we describe the eigenvector structure and we introduce the 
AR-transform. In the next section, we will use these results to efficiently compute 
filtered solutions of (1.1) when the blurring matrix A is in the ATZ algebra. 



1.4.1 Eigenvalues of ARn{-) Operators 



The spectral structure of any AR-BC matrix, with symmetric PSF h, is concisely 
described in the following result. 
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Theorem 1 [1] Let the blurring function (PSF) h be symmetric (i.e., hs = h^s), 
normalized, and satisfying the usual degree condition. Then the eigenvalues of the 
n X n AR-BC blurring matrix A in (1.8), n > 3, are given by h{0) = 1 with mul- 
tiplicity two and h{Gn-2). 

The proof can be easily derived by (1.6) which shows that the eigenvalues of 
AR„{h) are h{0) with multiplicity 2 and those of T„-2{h), i.e. h{G„-2), with 
multiplicity 1 each. 



1.4.2 The AR-Transform and Its Functional Interpretation 

Here we will determine the eigenvectors of every matrix AR„{h). In particular, we 
show that every AR-BC matrix is diagonalizable, and we demonstrate indepen- 
dence of the eigenvectors from the symbol h. With reference to the notation in 

(1.2-1.5), calling qj" ' the 7th column of Q„-2, and >>■ ' the jth point of 



Gn-2j 


= 1,...,« 


-2, we 


have 

















" m 







AR„ih) 


(n-2) 


= 


V„-2(/z) T„_2(/z) Jyn-2{h) 


(n-2) 

q] 











h{0) 












" 


1 






= 


^«-2(/.)q]"-^' 


=*(>n 


q(«-2) 


, 

















(1.10) 



since qj" is an eigenvector of T„-2{h) and h(y" J is the related eigenvalue. 

Due to the centro- symmetry of the involved matrix, if [1, p^, 0] is an eigenvector 
of AR„{h) related to the eigenvalue h{Q), then the other is its flip, i.e., 
[0, (/p) ,1] . Let us look for this eigenvector, by imposing the equality 



ARn{h) 



which is equivalent to seeking a vector p that satisfies 

yn-2{h) + T„-2(/!)P = /!(0)p. 

Since y„-2{h) = T„-2{<P{h))c\ by definition of the operator v„_2(') (see (1.6) and 
the lines below), and, because of the algebra structure of t„_2 and thanks to (1.7), 
we deduce that the vector p satisfies the relation 



"1" 




"r 


p 




= h{Q) 


p 





r„^2{h-h{0))[^L;\ei+p] =0 



(1.11) 
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where L„_2 is the discrete one-level Laplacian, i.e., L„_2 = t„_2(2 — 2cos(-)). 
Therefore by (1.11) the solution is given by p = L:^l2^i + r where r is any vector 
belonging to the kernel of x„^2{h — h{Q)). If tn^2{h — h{Q)) is invertible (as it 
happens for every nontrivial PSF, since the unique maximum of the function is 
reached at y = 0, which is not a grid point of G„_2), then the solution is unique. 
Otherwise r will belong to the vector space generated by those vectors q for 
which the index j is such that h{yj ) = h{()). However, the contribution con- 
tained in r was already considered in (1.10), and therefore p = ^^^2^1 i^ '^^e only 
solution that carries new information. Hence, independently of h, we have 



ARn{h) 



1 




1 


P Qn-2 JV 
1 


^=- 


P Qn-2 JV 
1 



h{0) 



dmg{h{Gn-2)) 



m 



Now we observe that the 7th eigenvector is unitary, 7 = 2, . . .,n — 1, because Q„-2 
is unitary: we wish to impose the same condition on the first and the last eigen- 
vector. The interesting fact is that p has an explicit expression. By using standard 
finite difference techniques, it follows that pj — l—j/{n~l) so that the first 
eigenvector is exactly the sampling of the function 1 — x on the grid j/ (« — 1 ) for 

n/3, where, 
o(l)).In 



j = 0, . . .,n — 1. Its Euclidean norm is a„ — \/Y11=oP/{^ ~ 1) 

for nonnegative sequences ^„, y,,, the relation y„ ^ fi„ means 7„ = /f„(l 

this way, the (normalized) AR-transform can be defined as 



a„ 'p Qn-2 a„ 



'/p 

-1 



(1.12) 



Remark 1 With the normalization condition in (1.12), all the columns of T„ are 
unitary. However orthogonality is only partially fulfilled since it holds for the 
central columns, while the first and last columns are not orthogonal to each other, 
and neither one is orthogonal to the central columns. We can solve the first 
problem: the sum of the first and of the last column (suitably normalized) and the 
difference of the first and the last column (suitably normalized) become ortho- 
normal, and are still eigenvectors related to the eigenvalue h{0). However, since 

q" has only positive components and the vector space generated by the first and 
the last column of T„ contains positive vectors, it follows that T„ cannot be made 
orthonormal just by operating on the first and the last column. Indeed, we do not 
want to change the central block of T„ since it is related to a fast 0{n log(n)) real 
transform and hence, necessarily, we cannot get rid of this quite mild lack of 
orthogonality. 

Remark 2 There is a suggestive functional interpretation of the transform T,,. 
When considering periodic BCs, the transform of the related matrices is the 
Fourier transform: its yth column vector, up to a normalizing scalar factor, can be 
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viewed as a sampling, over a suitable uniform gridding of [0, 2n], of the frequency 
function exp(— i/}'). Analogously, when imposing reflective BCs with a symmetric 
PSF, the transform of the related matrices is the cosine transform: its yth column 
vector, up to a normalizing scalar factor, can be viewed as a sampling, over a 
suitable uniform gridding of [0, re], of the frequency function cosQy). Here the 
imposition of the anti-reflective BCs can be functionally interpreted as a linear 
combination of sine functions and of linear polynomials (whose use is exactly 
required for imposing C' continuity at the borders). 

The previous observation becomes evident in the expression of T„ in 
(1.12). Indeed, by defining the one-dimensional grid G„ = [0, G,y_2, n] — 
\jn/{n — l)]"ro , which is a subset of [0, ti], we infer that the first column of T„ is 
given by o!^'(l ~yln)\~ , the yth column of T„,j = 2, . . .,n ~ 1, is given by 



■\/2/(n — l)(sin(/y))|~ , and finally the last column of T„ is given by a^ ^iy/T^)\~ 



I.e. 



1 --,sin(>'),...,sin((«-2)>'),- 
n n. 



■ A„ 



(1.13) 



A„ = diag a„ ' 



«- 1 



^M — 2; ^,j 



Finally, it is worth mentioning that the inverse transform is also described in 
terms of the same block structure since 



-G«-2P Qn-2 -Qn-lJp 



(1.14) 



Theorem 2 [AR„{-) Jordan Canonical Form.] With the notation and assumptions 
of Theorem 1, the n x n AR-BC blurring matrix A in (1.8), n > 3, coincides with 

AR„{h) = T„ diag{h{G„)) T;\ (1.15) 

where T„ and T^^ are defined in (1.13) and (1.14), while Gn ~ [0, G^_2,0] . 



1.5 Filtering Methods for AR-BC Matrices 



As mentioned in Sect. 1.1, regardless of the imposed boundary conditions, matrices 
A that arise in signal and image restoration are typically severely ill-conditioned, 
and regularization is needed in order to compute a stable approximation of the 
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solution of (1.1). A class of regularization methods is obtained through spectral 
filtering [11, 12]. Specifically, if the spectral decomposition of A is 



A = r„ diag(d) T;\ Tn = [ti t2 • • ■ t„ ], 
with d ~ h{Gn), then a spectral filter solution is given by 



(=1 



di 



(1.16) 



where (/>, are filter factors that satisfy 



1 if di is large, 
if di is small. 



The small eigenvalues correspond to eigenvectors with high frequency compo- 
nents, and are typically associated with the noise space, while the large eigen- 
values correspond to eigenvectors with low frequency components, and are 
associated with the signal space. Thus filtering methods attempt to reconstruct 
signal space components of the solution, while avoiding reconstruction of noise 
space components. 

For example, the filter factors for two well known filtering methods, truncated 
spectral value decomposition (TSVD) and Tikhonov regularization, are 



itsvd 



if di > 5, 
if di < d 



and 



,tik 



df 



dh 



1>0, 



(1.17) 



where the problem dependent regularization parameters 8 and 1 must be chosen 
[12]. Several techniques can be used to estimate appropriate choices for the reg- 
ularization parameters when the SVD is used for filtering (i.e., di are the singular 
values), including generalized cross validation (GCV), L-curve, and the discrep- 
ancy principle [9, 11, 19]. 

In our case, the notation in (1.17) defines a slight abuse of notation, because the 
eigenvalues cf, are not the singular values: in fact the Jordan canonical form (CF) in 
(1.15) is different from the singular value decomposition (SVD), since the trans- 
form T„ is not orthogonal (indeed it is a rank-2 correction of a symmetric 
orthogonal matrix). Therefore note that the use of ^f^ in (1.16) defines the 
filtering of the eigenvalues in the Jordan canonical form instead of the more 
classical filtering of the singular values in the SVD. However, we note that in 
general computing the SVD can be computationally very expensive, especially in 
the multidimensional case and also in the strongly symmetric case. Moreover, 
quite surprisingly, a recent and quite exhaustive set of numerical tests, both in the 
case of signals and images (see [16, 18]), has shown that the truncated Jordan 
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canonical form is more or less equivalent to the truncated SVD in terms of quality 
of the restored object: indeed this is a delicate issue that deserves more attention in 
the future. 

Furthermore, also the so-called Tikhonov regularization needs a further dis- 
cussion in this direction. Indeed its definition is related to the solution of the linear 
system {A^A + a /)ftik = A^g, but the ATZ algebra is not closed under transpo- 
sition. Hence we cannot use the Jordan canonical form for computing the solution 
of this linear system in a fast and stable way. In [6] it was proposed to replace A^ 
by A: in such a way the associated Tikhonov-like linear system becomes 
(A^ + l/)freg = Ag, and now fieg is the one in (1.16) with the filter factors (/)'' . In 
[8] it has been shown that the considered approach, called reblurring, arises when 
looking at the regularized solution of the continuous problem and then followed by 
the anti-reflective approximation of each operator separately. 

Our final aim is to compute (1.16) in a fast and stable way. We can follow two 
strategies. 

Strategy 1 . This is the classic approach implemented for instance with periodic 
BCs by using three FFTs. In our case we employ the AR-transform, its inverse, and 
a fast algorithm for computing the eigenvalues. 

Algorithm 1 

1. i = T-'g, ^ 

2. d = [h{0),d , h{0)] , where d = [d2, ■ ■ ., cfn-i] are the eigenvalues of tn-2{h) 
that can be computed by a fast sine transform (EST), 

3. f = ((/>./d).* g, where the dot operations are component-wise. 



4. f„ 



TJ. 



The product T^i can be clearly computed in a fast and stable way by one EST. 
Indeed for all x e R" 







"1" 











" 


r„x = 


a„'^i 


P 




+ 


G«-2x(2 : n ~ 



-1) 


+ a„ 'x« 


/p 

1 



where x(2 : n — 1) in Matlab notation is the vector x with components indexed 
from 2 to « — 1 . A similar strategy can be followed for computing the matrix- 
vector product T^'g. Instead of a^'p there is u = — 2«-2P and instead of a^'/p 
there is w = — Q„-2-^P- Recalling that p = L~^2^\ the two vectors u, and w can be 



, n — 2 and 



— 1/9 

explicitly computed obtaining m, = (2n — 2) ' cot(2^), for / = 1, 

w = diag,^, „-2(-l)'+'u- 

Strategy 2. Now we describe a slightly different approach for computing filtered 
solutions. In particular, we see that the eigenvalues di — d„ — li{0) have eigen- 
vectors essentially belonging to the signal space (for more details, refer to the 
subsequent Remark 4). Hence we set a priori (/>! = (/)„ = 1, and rewrite the filtered 
solution as 
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Fs Fs ^ Fe 



/=2 



Now observe that ti = ei,t„ — e„, and for i = 2,3, . . .,n — 1, t,- = [0,q,?lj,0] , 
where q^ are columns of the DST-I matrix Qn-i- Thus, the filtered solution can be 
written as 



freg =^(^1*1 +8nin) 





ireg 





Letg= fei,g^,g,/, then 



n-l 



t/g. 



reg 



S^'X"' 



/=2 
n-l 






where 



Therefore 



=2 
2n-2y, 



y = diag,.^2,...,«-l ( ^ ) (G«-2g - 8lQn-2P - gnQn-lJp)- 



freg = 6,7-2 diag,.^2,...,n-i(;^j2«-2g, wherc g^ g~ giP - g„Jp- 

The algorithm can be summarized as following 
Algorithm 2 

1- g ^ g - glP - gnJp, 

2. freg = Qn-2 diag,.^2,...,n-i (f ) Gn-2 g by three FSTs, 



Of — J- 



We can compare the two strategies when (/>, are the two classic choices in 
(1.17). Concerning the spectral truncation performed with cfif^ , for every choice 

of S we have 5 < di ^ d„ = max,-^i „ dt and then (f)'^'' — cf/^'^ = 1. Therefore the 

two strategies are exactly the same. On the other hand, for Tikhonov regularization 



" " 




"1" 




' ' 


reg 




+ 81 


P 




+ gn 


/p 

1 



7^ 1 and the two strategies are slightly different as already 



, tik _ , tik _ jHor 

<Pl - 9n - h{Of+A 

observed in [4]. Indeed the first one arises from the reblurring approach proposed 
in [6], while the second one is the same as described in [4] (see Remark 5). 
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We close this section with a few remarks. 

Remark 3 The difference between the two strategies increases with 1, hence it is 
more evident when the problem requires a substantial amount of regularization, 
while it is negligible for small values of ?.. Furthermore the second approach 
imposes (/fj =(/)„ = 1 a priori. Hence, for the implementation, we can also use 
Algorithm 1 where at step 3 we add </)[=(/)„ = 1. In this way the same vector freg 
is computed, as in Algorithm 2. 

Remark 4 As discussed in Remark 2, there is a natural interpretation in terms of 
frequencies when considering one-dimensional periodic and reflective BCs. The 
eigenvalue obtained as a sampling of the symbol h at a grid-point close to zero, i.e. 
close to the maximum point of h, has an associated eigenvector that corresponds to 
low frequency (signal space) information, while the eigenvalue obtained as a 
sampling of the symbol ft at a grid-point far away from zero (and, in particular, 
close to 7t), has an associated eigenvector that corresponds to high frequency 
(noise space) information. Concerning anti-reflective BCs, the same situation 
occurs when dealing with the frequency eigenvectors Jljiji — l)(sin(/>'))|~ , 

y = 2, ...,« — 1. The other two exceptional eigenvectors generate the space of 
linear polynomials and therefore they correspond to low frequency information: 
this intuition is well supported by the fact that the related eigenvalue is /z(0), i.e. 
the maximum and the infinity norm of h, and by the fact that AR-BCs are more 
precise than other classical BCs. 

Remark 5 The matrix 2«-2diag,=2 n-ii'Pi/'^dQn-i is the t matrix with eigen- 
values (t>i/dj, for / = 2, . . ., « — 1. Therefore step 2 in Algorithm 2 is equivalent to 
regularizing a linear system with coefficient matrix tn-i{h) corresponding to the 
inner part of A —AR„{h). It is straightforward that this strategy is exactly the 
approach used in [4] with homogeneous AR-BCs. Obviously, as already discussed 
in Remark 1, the two eigenvectors that complete the sine basis can be chosen in 
several ways: for instance in [4], instead of [l,p^,0] and [0, (/p) ,1] , the 
authors prefer to consider the first vector and the vector e with all components 
equal to one, since 



"1" 




" " 


p 




+ 


Jp 

1 



1.6 A Numerical Comparison of the Two Strategies 

For Tikhonov regularization, a comparison between the two strategies described in 
the previous section is already provided in [4] Sect. 1.6. However, we report an 
example where, according to Remark 3, we explicitly compare both strategies 
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varying X. Since from a computational point of view tiiey are equivalent, we will 
compare only the quality of the restored signals. 

In our example we use the true signal and the out of focus PSF shown in 
Fig. 1.1. The out of focus blurring is well-known to be severely ill-conditioned. 

The two dotted vertical lines shown in the figure of the true signal denote the 
field of view of our signal; that is, the signal inside the dotted lines represents that 
part of the signal that can be directly observed, while the signal extending outside 
the dotted lines represents information that cannot be directly observed, and which 
must be approximated through boundary conditions. To the blurred signal we add 
white Gaussian noise (i.e., normally distributed random values with mean and 
variance 1) with a percentage l|f/2/||fbiur||2' where fbiur is the (noise free) blurred 
signal and r\ is the noise. We consider two different levels of noise, 1 and 10%. The 
observed signals are shown in Fig. 1.2. 

Clearly the problem with 10% noise requires a stronger regularization than the 
problem with 1% noise. For both strategies, in Fig. 1.3 we show the restored 
signals, while in Fig. 1 .4 we report the logarithmic plot of the relative restoration 
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10- 
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errors varying 1. In the legends the first strategy is called "re-blurring" according 
to the terminology of [6] where this idea was proposed, while the second strategy 
is called "homogeneous AR" according to the terminology in [4], where this 
variation of the theme was discussed. For this example, we note that for a low level 
of noise, i.e. in our case 1%, the two strategies are equivalent. Indeed the two 
restored signals are not distinguishable in Fig. 1.3. Moreover in Fig. 1.4 we 
observe that the second strategy becomes superior with respect to the first only for 
A > 10"^, while the optimum is reached for a < 10"-. On the other hand, for higher 
levels of noise, i.e. for instance 10%, we need a more substantial regularization 
and hence a larger value of a. From Fig. 1 .4, by looking in a neighborhood of the 
optimal value of ?., we notice that the second procedure becomes more precise and 
in fact the restored signal is computed with a lower error norm. However, in 
Fig. 1.3 we can see that the quality of the restored signal is not sensibly improved. 
Finally, we compare the two strategies by varying the noise level. Figure 1.5 
shows in logarithmic scale the optimal relative restoration error changing the noise 
level for both techniques. When the noise is lower than 10%, the two strategies 
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Fig. 1.5 Optimal relative 
restoration errors vs noise 
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achieve about the same minimal restoration error. For a noise level greater than 
10%, we observe a different minimum and the second proposal seems to be slightly 
better. 

In the previous example, the second strategy seems to be slightly superior with 
respect to the first one, when the problem requires stronger regularization. On the 
other hand, when the optimal value of k is small, the considered procedures are 
essentially similar, according to the theoretical discussion in Sect. 1.5. 



1.7 Multilevel Extension 



Here we provide some comments on the extension of our findings to cf-dimensional 
objects with d> \. Note when d— l,h is a vector, when li = 2,h is a 2D array, 
when c? = 3,h is a 3D tensor, etc. For d =\ and with reference to the previous 
sections, we have proved that, thanks to the definition of a (fast) AR-transform, it 
is possible to define a truncated spectral decomposition. However we are well- 
aware that the real challenge is represented by a general extension to the multi- 
dimensional setting. This is the topic that we briefly discuss in the rest of the 
section. 

With reference to Sect. 1.2.2 we propose a (canonical) multidimensional 
extension of the algebras AR. and of the operators AR„{-),n — {rii, . . .,nd): the 
idea is to use tensor products. If h = h{-) is cJ-variate real-valued cosine polyno- 
mial, then its Fourier coefficients form a real li-dimensional tensor which is 
strongly symmetric. In addition, h{y),y — (ji, . . .,.Vrf), can be written as a linear 
combination of independent terms of the form ^(y) — Hti cos(cyyy) where any cj 
is a nonnegative integer. Therefore, we define 



AR„{m{y)) = A/?„, (cos(ciyi)) (g> ■ ■ ■ ® A/?,„(cos(q>'j)), 



(1.18) 



where denotes Kronecker product, and we force 
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ARn{ahi + ph2) - aAR„{h) + liAR„{h2) (1.19) 

for every real a and jS and for every (i-variate real-valued cosine polynomials 
h\ = hi{-) and /12 = hii')- It is clear that the request that A7?„(-) is a linear operator 
(for cf > 1, we impose this property in (1.19) by definition) is sufficient for defining 
completely the operator in the cf-dimensional setting. 

With the above definition of the operator AR„{-), we have 

1. cxAR„{hi) + liAR„{h2) =AR„{ahi + fihi), 

2. AR„ihi)AR„{h2)=AR„{hih2), 

for real a and /J and for cosine functions hi = hi{-) and h2 = h2{-). 

The latter properties of algebra homomorphism allows to define a commutative 
algebra ATZ of the matrices A/?„(/z), with h{-) being a cf-variate cosine polynomial. 
By standard interpolation arguments it is easy to see that ATZ can be defined as the 
set of matrices AR„{h), where his a cf-variate cosine polynomial of degree at most 
fij — 3 in the y'th variable for every j ranging in { 1 , . . . , J} : we denote the latter 

polynomial set by V„^2e ' ^i^^ e being the vector of all ones. Here we have to be 
a bit careful in specifying the meaning of algebra when talking of polynomials. 
More precisely, for hi,h2 £ r^-™ ^^ product h\ ■ h2 is the unique polynomial 
h G V„f^ satisfying the following interpolation condition 

h{y)=zy, Zy = h{y)h2{y), Vy G G^IV (1.20) 

If the degree of hi plus the degree of h2 in the yth variable does not exceed 
Hj — 2J — I,. . .,d, then the uniqueness of the interpolant implies that h coin- 
cides with the product between polynomials in the usual sense. The uniqueness 

holds also for d>2 thanks to the tensor form of the grid G„_2 (see [1] for more 
details). The very same idea applies when considering inversion. In conclusion, 
with this careful definition of the product/inversion and with the standard 

definition of addition, 1^1,^2™ ^^^ become an algebra, showing the vector- 
space dimension equal to (ni — 2) ■ («2 — 2) • ■ • (n,; — 2) which coincides with 
that of An„. 

Without loss of generality and for the sake of notational clarity, in the following 
we assume «y — n for j = I, . . .,d. Thanks to the tensor structure emphasized in 
(1.18-1.19), and by using Theorem 2 for every term AR„{cos{cjyj)) j — I, . . .,d, 
of ARn{m) the d-\evel extension of such a theorem easily follows. More precisely, 
if /; is a cZ-variate real-valued cosine symbol related to a (i-dimensional strongly 
symmetric and normalized mask h, then 

AR„{h)^Tlf^D„{Tl''^)-\ r('') = r„®...®r„, (1.21) 

(d times) where D„ is the diagonal matrix containing the eigenvalues of AR„{h). 
The description of D„ in d dimensions is quite involved when compared with the 
case d = 1, implicitly reported in Theorem 1. 
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For a complete analysis of the spectram of AR„{h) we refer the reader to [1]. 
Here we give details on a specific aspect. More precisely we attribute a corre- 
spondence in a precise and simple way among eigenvalues and eigenvectors, by 

making recourse only to the main d-vahate symbol h{-). Let x„ = x„ ® x„ ® 
•••®xi"'' be a column of T^'\ with x|/'' e {a,7i[l,p^,0]^, a,7'[0, (/p)^, 1]^} or 
x„ =[0, q^,0] ,l<Sj<n — 2 and q^ is the {sj)th column of Q„-2, for 
j = l,...,d. Let 

•^.„ = {J I x« = «„-'[!, P^O]^ or xW = a,;>[0, {Jpf , if} C {1, . . .,d}, 

th x„ being the generic 
eigenvalue related to x„ is 



with x„ being the generic eigenvector, i.e., the generic column of T„ . The 



X^h 



(yl"^...,^!;") (1.22) 



where y" — fory G J^j:„ and y:"' — ^ fory ^ T^^^ . We define the cf-dimensional 
grid 

g''^ = G„i§ ■ ■ ■ (§G„ fif times, (1.23) 

as a vector of length n'' whose entries are cJ-tuples. More precisely given two 
vectors x and y of size p and q, respectively, whose entries are /-tuples and m- 
tuples, respectively, the operation z = x ® y produces a new vector of size pq 
containing all possible chains x, o yj,i — 1, . . .,p,j = 1, . . .,q: in this way the 
entries of z are tuples of length I + m and the ordering of the entries in z is the 
same as that of the standard Kronecker product. In other words and shortly, we can 
claim that the operation (3 is the same as O where the standard product between 
elements is replaced by the chain operation between tuples. Hence we can evaluate 

the (i-variate function h on G„ since its entries are points belonging to the 
definition domain of h. Using this notation the following compact and elegant 
result can be stated (its proof is omitted since it is simply the combination of the 
eigenvalue analysis in [1], of Theorem 2, and of the previous tensor arguments). 

Theorem 3 [AR„{-) Jordan Canonical Form.] The n'' x n'^ AR-BC blurring matrix 
A, obtained when using a strongly symmetric d-dimensional mask h such that 
hi — if \ij\ >n — 2 for some j G {I, . . .,d} (the d-dimensional degree condition), 
M > 3, coincides with 

AR„ih) = Tjf^ diag(KGl'')) (rf )-\ (1.24) 

where T„ and G„ are defined in (1.21) and (1.23). 

This shows that the study of the anti-reflective transform is not only of interest 
in itself both from theoretical and computational viewpoints, but it is also useful in 
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simplifying the analysis and the interpretation of the eigenvalues studied in [1]: in 
this sense compare the elegant result in Theorem 3 and the quite tricky combi- 
natorial representation in [1]. 

It is finally worth observing that, except for more involved multi-index nota- 
tions, both Algorithms 1 and 2 in Sect. 1.5 are plainly generalized in the multilevel 
setting, maintaining a cost proportional to three d-leve\ FSTs of size (n — 2) , and 
the key tool is the simplified eigenvalue-eigenvector correspondence concisely 
indicated in Theorem 3. Indeed, for Algorithm 1 the only difficult task is the 
computation in step 2, where we have to compute the eigenvalues in the right 
order. For this task we refer to [1], where an algorithm is proposed and studied: 
more specifically the related procedure in [1] is based on a single d-\evel FST of 
size (n — 2) plus lower order computations. For the second strategy in Sect. 1.5, 
we can operate as suggested in Remark 3. We employ Algorithm 1 where we fix 

a priori (p^ = 1 , when the corresponding grid point in G„ is equal to zero (notice 
that, according to (1.23) and to the definition of G„, there exist 2^^ of such points). 

We conclude this section with an example illustrating the approach discussed 
above for a two-dimensional imaging problem. We do not take an extensive 
comparison of the AR-BCs with other classic BCs, like periodic or reflective, since 
the topic and related issues have been already widely discussed in several works 
(see e.g. [4, 6, 8]), where the advantage on some classes of images, in terms of the 
restored image quality, of the application of AR-BCs has been emphasized. Here 
we propose only a 2D image deblurring example with Gaussian blur and various 
levels of white Gaussian noise. 

The true and the observed images are in Fig. 1.6, where the observed image is 
affected by a Gaussian blur and 1% noise. We compare the AR-BCs only with the 
reflective BCs since for this test other BCs like periodic or Dirichlet do not 
produce satisfactory restorations. In Fig. 1.7 we observe a better restoration and 
reduced ringing effects at the edges for AR-BCs with respect to reflective BCs. 



True image Observed image 




Fig. 1.6 Test problem with Gaussian blur and 1 % white Gaussian noise 
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Reflective 



Anti-reflective 




Fig. 1.7 Restored images for the test problem in Fig. 1.6 



Table 1.1 Minimum relative 
restoration errors for the 
image restoration problem 



Noise (%) Reflective AR — strategy 1 AR — strategy 2 



10 

1 

0.1 



0.1284 
0.1188 
0.1186 



0.1261 
0.1034 
0.0989 



0.1271 
0.1034 
0.0989 



Restored images in Fig. 1.7 are obtained with the minimum relative restoration 
error varying several values of the regularization parameter X. 

In the previous example, the choice between the two strategies for AR-BCs is 
not important since, as already observed in Sect. 1.6, they provide different res- 
torations only for a high noise level. This fact is evident in Table 1.1 where we 
compare the minimum relative restoration errors for the reflective BCs and the two 
strategies for AR-BCs. As already noted, the two approaches differ only for the 
four values of the filter factors corresponding to the vertices of the image. We note 
that for the 10% noise case, all of the approaches give comparable restorations. On 
the other hand, decreasing the noise, i.e., passing to 1% and then to 0.1% noise, the 
AR-BCs improve the restoration while the reflective BCs are not able to do that, 
due to the barrier of the ringing effects. 
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Chapter 2 

Classifications of Recurrence Relations 

via Subclasses of (H, m)-quasiseparable 

Matrices 

T. Bella, V. Olshevsky and P. Zhloblch 



Abstract The results on characterization of orthogonal polynomials and Szego 
polynomials via tridiagonal matrices and unitary Hessenberg matrices, respec- 
tively, are classical. In a recent paper we observed that tridiagonal matrices and 
unitary Hessenberg matrices both belong to a wider class of (H, l)-quasiseparable 
matrices and derived a complete characterization of the latter class via polyno- 
mials satisfying certain EGO-type recurrence relations. We also established a 
characterization of polynomials satisfying three-term recurrence relations via 
{H, 1) -well-free matrices and of polynomials satisfying the Szego-type two-term 
recurrence relations via {H, l)-semiseparable matrices. In this paper we generalize 
all of these results from scalar (H,l) to the block (H, m) case. Specifically, we 
provide a complete characterization of {H, m)-quasiseparable matrices via poly- 
nomials satisfying block EGO-type two-term recurrence relations. Further, 
{H, m)-semiseparable matrices are completely characterized by the polynomials 
obeying block Szego-type recurrence relations. Finally, we completely charac- 
terize polynomials satisfying m-term recurrence relations via a new class of 
matrices called {H, m)-well-free matrices. 
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2.1 Introduction 

2.1.1 Classical Three-term and Two-term Recurrence Relations 
and Their Generalizations 



It is well known that real-orthogonal polynomials {ri(x)} satisfy three-term 
recurrence relations of the form 



rk{x) = {au-x - 8k)rk^i (x) - y^ • n-iix), (Xk ^ 0, j'^. > 0. 



(2.1) 



It is also well known that Szego polynomials {(/)*(x)}, or polynomials orthogonal 
not on a real interval but on the unit circle, satisfy slightly different three-term 
recurrence relations of the form 



k#( 



1 



Pk 1 



Pk-i A'/t. 



4-ii^)-^^^^-x-4_,{x) (2.2) 

Pk-l /^* 



Noting that the essential difference between these two sets of recurrence relations 
is the presence or absence of the x dependence in the (A: — 2)th polynomial, it is 
natural to consider the more general three-term recurrence relations of the form 

rk{x) = {dkx - 3k) ■ rk-i{x) - (^S^x + y^) ■ r^-iix), (2.3) 

containing both (2.1) and (2.2) as special cases, and to classify the polynomials 
satisfying such three-term recurrence relations. 

Also, in addition to the three-term recurrence relations (2.2), Szego polynomials 
satisfy two-term recurrence relations of the form 



Mx) 

4ix) 



1 



1 
-pk 



^Pk 

1 



x(j)*_^{x) 



(2.4) 



for some auxiliary polynomials {4>k{x)} (see, for instance, [18, 20]). By relaxing 
these relations to the more general two-term recurrence relations 



Gk{x) 
rk{x) 



«k k 
1 



I'k 



Gk-i[x) 
{dkX + Ok)rk-i{x) 



(2.5) 



it is again of interest to classify the polynomials satisfying these two-term 
recurrence relations. 

In [8], these questions were answered, and the desired classifications were given 
in terms of the classes of matrices A — [fly] . ._. related to the polynomials {rt(x)} 
via 

rkix)^ det(x/-A),^^^., k^O,...,n, (2.6) 



a\.oa2,i ■ ■ -cik+i.k 



where Aj^xt) denotes the k x k principal submatrix of A. Note that this relation 
involves the entries of the matrix A and two additional parameters oi o and fl„+i „ 
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Table 2.1 Correspondence 
between recurrence relations 
satisfied by polynomials and 
related subclasses of 
quasiseparable matrices, 
from [8] 



Recurrence relations 



Matrices 



Real-orthogonal 

three-term (2.1) 
Szego two-term (2.4)/ 

three-term (2.4) 
General three-term (2.3) 
Szego-type two-term (2.5) 
EGO-type two-term (2.16) 



Irreducible tridiagonal matrices 

Almost unitary Hessenberg 

matrices 
{//, l)-well-free (Definition 4) 
{H, l)-semiseparable (Definition 3) 
{H, l)-quasiseparable (Definition 1) 



outside the range of parameters of A. In the context of this paper, these parameters 
not specified by the matrix A can be any nonzero numbers'. These classifications 
generalized the well-known facts that real-orthogonal polynomials and Szego 
polynomials were related to irreducible tridiagonal matrices and almost unitary 
Hessenberg matrices, respectively, via (2.6). These facts as well as the classifi- 
cations of polynomials satisfying (2.3), (2.5) and a third set to be introduced later, 
respectively, are given in Table 2.1. 

Furthermore, the classes of matrices listed in Table 2.1 (and formally defined 
below) were shown in [8] to be related as is shown in Fig. 2.1. 

While it is likely that the reader is familiar with tridiagonal and unitary Hes- 
senberg matrices, and perhaps quasiseparable and semiseparable matrices, the 
class of well-free matrices is less well-known. We take a moment to give a brief 
description of this class (a more rigorous description is provided in Sect. 2.5.1). A 
matrix is well-free provided it has no columns that consist of all zeros above (but 
not including) the main diagonal, unless that column of zeros lies to the left of a 
block of all zeros. That is, no columns of the form shown in Fig. 2.2 appear in the 
matrix. 

As stated in Table 2.1, it was shown in [8] that the matrices related to poly- 
nomials satisfying recurrence relations of the form (2.3) are not just well-free, but 
[H, l)-well-free; i.e., they are well-free and also have an (//, l)-quasiseparable 
structure, which is defined next. 



2.1.2 Main Tool: Quasiseparable Structure 

In this section we give a definition of the structure central to the results of this 
paper, and explain one of the results shown above in Table 2. 1 . We begin with the 
definition of (7/, m)-quasiseparability. 

Deflnition 1 ((//, m)-quasiseparable matrices). Let A be a strongly upper Hes- 
senberg matrix (i.e. upper Hessenberg with nonzero subdiagonal elements: fl,j = 



More details on the meaning of these numbers will be provided in Sect. 2.2.1. 
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{H, l)-quasiseparable 
matrices r.r. (16) 




[H, l)-well-free matrices 
^■^- (3) 
Irreducible \ 

tridiagonal matrices I 
r.r. (1) J 



(H, l)-semiseparable 
matrices r.r. (5) 

/ Unitary 

Hessenberg matrices 
r.r. (4) only 



Unitary 
Hessenberg matrices 
V r.r. (4) & (2) ^ 



Fig. 2.1 Relations between subclasses of (H, l)-quasiseparable matrices, from [8] 
Fig. 2.2 Illustration of a well 



^ 





something 
nonzero 


\ 


^ 



for / > y + 1, and a,+i.; y^ for / 
titions of the form 



1, . . .,n — 1). Then over all symmetric par- 



* 


An 


* 


* 



A = 



(i) if max rank A^ — rn, then A is {H, m)-quasiseparable, and 
(ii) if max rank A^ < m, then A is weakly (H, m)-quasiseparable. 

For instance, the rankm blocks (respectively rank at most m blocks) of a 
5x5 {H, m)-quasiseparable matrix (respectively weakly {H, m)-quasiseparable 
matrix) would be those shaded below: 



• 


• * * • 


• 


• * • • 





•k -k -k -k 





-k -k * 





0** 



• -k 

oT 

00 
00 



• • • 

• • 7k- 

• • • 

• * * 
0** 



* • * 

-k -k k 

Okk 


* * 


00* 
000 





-k -k k k 
-k k k k 
k k k 
00** 



000* 



Ai2 = A(l ; A:, ^ + 1 : «), A: = 1, . . ., « - 1 in the MATLAB notation. 
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2.1.3 Motivation to Extend Beyond the (H, 1) Case 
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In this paper, we extend the results of these classifications to include more general 
recurrence relations. Such generalizations are motivated by several examples for 
which the results of [8] are inapplicable as they are not order {H, 1); one of the 
simplest of such is presented next. 

Consider the three-term recurrence relations (2.1), one could ask what classes 
of matrices are related to polynomials satisfying such recurrence relations if more 
than three terms are included. More specifically, consider recurrence relations of 
the form 



X ■ rk-i{x) = ~ak.krk{x) - ak-\,krk-\{x) 



ak-{i-\).k ■ rk-{i-i){x) (2.7) 



It will be shown that this class of so-called /-recurrent polynomials is related via 
(2.6) to (1, / — 2)-banded matrices (i.e., one nonzero subdiagonal and / — 2 non- 
zero superdiagonals) of the form 



A = 



(32,2 







ao,/-i 







flii 







an-(l-\).n 
a„-2,n-2 '■ 

a«-i,«-i ««-!,« 



(2.8) 



This equivalence cannot follow from the results of [8] as summarized in Table 2.1 
because those results are limited to the simplest (//, l)-quasiseparable case. As we 
shall see in a moment, the matrix A of (2.8) is (//, / — 2)-quasiseparable. 

Considering the motivating example of the matrix A of (2.8), it is easy to see 
that the structure forces many zeros into the blocks A 12 of Definition 1 (the shaded 
blocks above), and hence the ranks of these blocks can be small compared to their 
size. It can be seen that in the case of an (l,m)-banded matrix, the matrices A\2 
have rank at most m, and so are (//, m)-quasiseparable. 

This is only one simple example of a need to extend the results listed in 
Table 2.1 from the scalar (//, l)-quasiseparable case to the block (//, m)-quasi- 
separable case. 



2.1.4 Main Results 



The main results of this paper are summarized next by Table 2.2 and Fig. 2.3, 
analogues of Table 2. 1 and Fig. 2. 1 above, for the most general case considered in 
this paper. 
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Table 2.2 Correspondence between polynomial systems 
rable matrices 



and subclasses of {H, m)-quasisepa- 



Recurrence relations 



Matrices 



Classical Real-orthogonal three-term (1) 

Szego two-term (4)/three-term (2) 

[8] General three-term (3) 

Szego-type two-term (5) 
EGO-type two-term (16) 

This paper General /-term (45) 

Szego-type two-term (32) 
EGO-type two-term (2.15) 



Irreducible tridiagonal 
Almost unitary Hessenberg 
(//, l)-well-free (Definition 4) 
(//, l)-semiseparable (Definition 3) 
{H, l)-quasiseparable (Definition 1) 
{H, m)-well-free (Definition 5) 
(H, m)-semiseparable (Definition 3) 
{H, m)-quasiseparable (Definition 1) 



{H, 77i)-quasiseparable matrices 
r.r. (15) 



(ff , m)-well-free matrices 
r.r. (45) 



{H, 7n)-seiniseparable matrices 
r.r. (32) 



J 



Fig. 2.3 Relations between subclasses of (//, m)-quasiseparable matrices 

Table 2.2 and Fig. 2.3 both mention (//, m)-well-free matrices. It is not 
immediately obvious from the definition of (//, 1) -well-free matrices how one 
should define an (//, m)-well-free matrix in a natural way. In Sect. 2.5, the details 
of this extension are given, but we briefly describe the new definition here. A 
matrix is (H, m)-well-free if 



rank B 



.('«) 



(m) 



rank B 



(m+\) 



'•=1,2, 



(2.9) 



where the matrices B) ' are formed from the columns of the partition An of 
Definition 1, as 



We show in this paper that (//, m) -well-free matrices and polynomials satisfying 

rk{x) = {8k,kX + Ek.k)rk-\x H h {Sk+m-i.kX + £k+m-2.k)rk+m-i{x), (2.10) 

" V ' 

m+ \ terms 

provide a complete characterization of each other. 
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Next, consider briefly the m = \ case to see that this generalization reduces 
properly in the {H, 1) case. For m — I, this relation implies that no wells of width 
OT = 1 form as illustrated in Fig. 2.2. 

2.2 Correspondences Between Hessenberg Matrices 
and Polynomial Systems 

In this section we give details of the correspondence between (//, m)-quasisepa- 
rable matrices and systems of polynomials defined via (2.6), and explain how this 
correspondence can be used in classifications of quasiseparable matrices in terms 
of recurrence relations and vice versa. 

2.2.1 A Bijection Between Invertible Triangular Matrices 
and Polynomial Systems 

Let T be the set of invertible upper triangular matrices and V be the set of 
polynomial systems {ru} with deg r^ — k. We next demonstrate that there is a 
bijection between T and V. Indeed, given a polynomial system R = 
{ro(jt), ri(x), . . ., r„(x)} G "P satisfying deg {rk)~k, there exist unique «-term 
recurrence relations of the form 

ro(x) = ao,o,JC ■ '"i-i W = ak+\M ■ rk{x) - ak,k ■ '"yt-i W ai,<: • ro{x), 

ak+i,k^O, k^l,...,n (2-11) 

because this formula represents x ■ r^-x £ Pj (Pjt being the space of all polyno- 
mials of degree at most k) in terms of r^, rk-\,rk-2, ■ ■ ■, ^o, which form a basis in 
Pk, and hence these coefficients are unique. Forming a matrix B ^ T from these 
coefficients as B = [fl,j] . . „ (with zeros below the main diagonal), it is clear that 

there is a bijection between T and V, as they share the same unique parameters. 
It is shown next that this bijection between invertible triangular matrices and 
polynomials systems (satisfying deg ri,{x) — k) can be viewed as a bijection 
between strongly Hessenberg matrices together with two free parameters and 
polynomial systems (satisfying deg rj^(x) — k). Furthermore, the strongly Hes- 
senberg matrices and polynomial systems of this bijection are related via (2.6). 
Indeed, it was shown in [24] that the confederate matrix A, the strongly upper 
Hessenberg matrix defined by 



(2.12) 



0,1 


fl0,2 


flO,3 




ao,n 


1,1 


fll,2 


a\3 




ai,„ 





fl2,2 


02,3 




«n-2,« 










««-l,;i-l 


fln-l,« 
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or in terms of B as 



B = 



«0,0 


flo.i 


flo.2 


ao,3 







«i,i 


ai,2 


fll,3 










02,2 


fl2,3 














«n-l,«-l 

















flO.n 




ai,n 




«n-2,;i 


= 


«n-l,n 




fl«,n . 









fl„,„ 



(2.13) 

is related to the polynomial system R via (2.6). This shows the desired bijection, 
with flo.o and «„ „ serving as the two free parameters. 

Remark 1 Based on this discussion, if R = {/-q, ri, . . ., r„_i, r„} is related to a 
matrix A via (2.6), then Rgj, = jaro, Vi, . . .,ir„_i,/7r„} for any nonzero parame- 
ters a and b provides a full characterization of all polynomial systems related to the 
matrix A. 



2.2.2 Generators of (H, m)-quasiseparable Matrices 



It is well known that Definition 2, given in terms of ranks is equivalent to another 
definition in terms of a sparse representation of the elements of the matrix called 
generators of the matrix, see, e.g., [13] and the references therein. Such sparse 
representations are often used as inputs to fast algorithms involving such matrices. 
We give next this equivalent definition. 

Definition 2 (Generator definition for (//, m)-quasiseparable matrices). A matrix 
A is called {H, m)-quasiseparable if (i) it is upper Hessenberg, and (ii) it can be 
represented in the form 



A: 




(2.14) 



where b^ = fo,+i 



■ bj^i forj > i + 1 and bj^ = 1 for j 
{Pk,qk-,dk,gk,bk,h}, 



(■ + 1 . The elements 
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called the generators of the matrix A, are matrices of sizes 

Pk+\qk 4 gk b^ fu 

Sizes 1x1 1x1 Ixut Uf-^i x m^. Mt_i x 1 

Range k€[l,n-l] ke[l,n] k€[l,n-l] ke[2,n-l] ke[2,n] 

subject to max^^ u^ — m. The numbers Uk,k — 1, . . ., m — 1 are called the orders of 
these generators. 

Remark 2 The generators of an (H, m)-quasiseparable matrix give us an 0{nm'^) 
representation of the elements of the matrix. In the {H, l)-quasiseparable case, 
where all generators can be chosen simply as scalars, this representation is 0{n). 

Remark 3 The subdiagonal elements, despite being determined by a single value, 
are written as a product pk+iQk, k = 1, . . .,n — I to follow standard notations used 
in the literature for quasiseparable matrices. We emphasize that this product acts as 
a single parameter in the Hessenberg case to which this paper is devoted. 

Remark 4 The generators in Definition 2 can be always chosen to have sizes 
Uk — m for all k by padding them with zeros to size m. 



Also, the ranks of the submatrices A12 of Definition 1 represent the smallest 

possible sizes of the corresponding generators. That is, denoting by A^j = A(l : 
k,k + I : n) the partition A 12 of the ^-th symmetric partition, then 



rankAj2<Mt, k—l,...,n. 
Furthermore, if generators can be chosen such that 

max rank A [2 = maxM^ ~ m, 

k k 

then A is an (//, m) -quasiseparable matrix, whereas if 
max rank A j2 < maxM^ — m, 

k k 

then A is a weakly {H, m) -quasiseparable matrix, following the terminology of 
Definition 1 . As stated above, we will avoid making explicit distinctions between 
{H, m) -quasiseparable matrices and weakly (//, m)-quasiseparable matrices. 
For details on the existence of minimal size generators, see [16]. 



2.2.3 A Relation Between Generators of Quasiseparable 
Matrices and Recurrence Relations for Polynomials 

One way to establish a bijection (up to scaling as described in Remark 1) between 
subclasses of {H, m) -quasiseparable matrices and polynomial systems specified by 
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generators 




recurrence relations 
coefficients 



.(2) 



polynomial 




Fig. 2.4 Relations between subclasses of (//, m)-quasiseparable matrices and polynomials 

recurrence relations is to deduce conversion rules between generators of the 
classes of matrices and coefficients of the recurrence relations. In this approach, a 
difficulty is encountered which is described by Fig. 2.4. 

The difficulty is that the relation (2.2) shown in the picture is one-to-one cor- 
respondence but (2.1) and (2.3) are not. This fact is illustrated by the next two 
examples. 

Example 1 (Nonuniqueness of recurrence relation coefficients). In contrast to the 
«-term recurrence relations (2.11), other recurrence relations such as the /-term 
recurrence relations (2.45) corresponding to a given polynomial system are not 
unique. As a simple example of a system of polynomials satisfying more than one 
set of recurrence relations of the form (2.45), consider the monomials R = 
{l,x, x^,...,x"}, easily seen to satisfy the recurrence relations 

ro{x) = l, rk{x)=x- rk-i(x), k=l,...,n 

as well as the recurrence relations 

ro{x)^l, ri{x)=x-rk-i{x), 
rk{x)^{x+l) ■rk-i(x)~x-rk-2{x), k^2,...,n. 

Hence a given system of polynomials may be expressed using the same recurrence 
relations but with different coefficients of those recurrence relations. 

Example 2 (Nonuniqueness of (//, m)-quasiseparable generators). Similarly, 
given an (//, m)-quasiseparable matrix, there is a freedom in choosing the set of 
generators of Definition 2. As a simple example, consider the matrix 









i 

2 











corresponding to a system of Chebyshev polynomials. It is obviously (//, 1)- 
quasiseparable and can be defined by different sets of generators, with either 
gk = l,hk = \or gk ^\,hk = 1. 
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Remark 5 To overcome the difficulties of the nonuniqueness demonstrated here, 
we can define equivalence classes of generators describing the same matrix and 
equivalence classes of recurrence relations describing the same polynomials. 
Working with representatives of these equivalence classes resolves the difficulty. 

We begin classification of recurrence relations of polynomials with considering 
EGO-type two-term recurrence relations (2.15) in Sect. 2.3 and associating the set 
of all {H, m)-quasiseparable matrices with them. Section 2.4 covers the corre- 
spondence between polynomials satisfying (2.32) and (//, m)-semiseparable 
matrices. In Sect. 2.5 we consider Z-term recurrence relations (2.45) and 
{H, ot) -well-free matrices. 



2.3 {H, ffi)-quasiseparable Matrices and EGO-type Two-term 
Recurrence Relations (2.15) 

In this section, we classify the recurrence relations corresponding to the class of 
{H, m)-quasiseparable matrices. The next theorem is the main result of this 
section. 

Theorem 1 Suppose A is a strongly upper Hessenberg matrix. Then the following 
are equivalent. 

(i) A is (//, m)-quasiseparable. 

(ii) There exist auxiliary polynomials {Fi;{x)} for some r^UiPk^ '^^d yi^ of sizes 
m X m, m X I and I x m, respectively, such that the system of polynomials 
{rjt(x)} related to A via (2.6) satisfies the EGO-type two-term recurrence 
relations 



Fo{x) 



ro{x) 







"0,0 




rk{x) 



a-k 




A 










Skx + Ok. 


[\ Ik 



Fk-i{x. 



rk-l{x) 



(2.15) 



Remark 6 Throughout the paper, we will not distinguish between {H, m) -quasi- 
separable and weakly (//, m)-quasiseparable matrices. The difference is technical; 
for instance, considering an {H, 2)-quasiseparable matrix as a weakly {H, 3) -quasi- 
separable matrix corresponds to artificially increasing the size of the vectors Fk{x) in 
(2.15) by one. This additional entry corresponds to a polynomial system that is 
identically zero, or otherwise has no influence on the other polynomial systems. In a 
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similar way, any results stated for (//, m)-quasiseparable matrices are valid for 
weakly (//, m)-quasiseparable matrices through such trivial modifications. 

This theorem, whose proof will be provided by the lemma and theorems of this 
section, is easily seen as a generalization of the following result for the {H, 1)- 
quasiseparable case from [8]. 

Corollary 1 Suppose A is a strongly Hessenberg matrix. Then the following are 
equivalent. 

(i) A is {H, l)-quasiseparable . 

(ii) There exist auxiliary polynomials {Fii(x)} for some scalars oi.k, fSj^, and y^. 
such that the system of polynomials {rk{x)} related to A via (2.6) satisfies 
the EGO-type two-term recurrence relations 



>oW 









'F,{xy 




_ rQ{x) 




flo.o 




_n{x)_ 





Tk 



5kx + 6k 



Fk-i{x) 
rk-\{x) 



(2.16) 



In establishing the one-to-one correspondence between the class of polynomials 
satisfying (2.15) and the class of (//, m)-quasiseparable matrices, we will use the 
following lemma which was given in [7] and is a consequence of Definition 2 and [24]. 

Lemma 1 Let A be an {H, m)-quasiseparable matrix specified by its generators 
as in Definition 2. Then a system of polynomials {ri(x)} satisfies the recurrence 
relations 



rk{.x) 



1 



Pk+iqk 



■dk)rk-i{x) 



k-2 



'>j+\bj+i.khkrj{x) 



(2.17) 



if and only if {rk{x)} is related to A via (2.6). 



Note that we have not specified the sizes of matrices gk,bk and h^ in (2.17) 
explicitly but the careful reader can check that all matrix multiplications are well 
defined. We will omit explicitly listing the sizes of generators where it is possible. 

Theorem 2 (EGO-type two-term recurrence relations => [H, m)-quasiseparable 
matrices.) Let R be a system of polynomials satisfying the EGO-type two-term 
recurrence relations (2.15). Then the (H, m)-quasiseparable matrix A defined by 

-^;'„a«-ian-2---a3a2^i 
-^)'„a«-ia„-2---a3/^2 

-f7«a«-i---a4^3 



1 
57 




-iy3«2/5i 

-i73/^2 


-iy4«3^2 





1 

,52 




-tM^ 








1 

^3 


Si 



'TlnPn-i 



d„-\ 



(2.18) 
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J ^^ / 1 I 1 

dk = -^, k = 1,. .. ,n, gk 

Ok 



Pk+iQk = ^, k = 1,... ,n-l, 

Ok 
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^ 



fc = l, 



,n — 1, 



ll , fc = 2, . . 



fc = 2, . . . ,n — 1, 



corresponds to the system of polynomials R via (2.6). 

Proof Considering EGO-type recurrence relations (2.15) we begin with 

rk{x) = (Skx + Ok)rk-i{x) + )'ifi-i(x). (2.19) 

Using the relation Fk-i{x) — aik-iFk-iix) + Pk-\^k-i{x), (2.19) becomes 

rk{x) == {6kX + Ok)rk-\{x) + ykh-\rk-i{x) + yk<^k-\Fk-i{x) (2.20) 

The Eq. (2.20) contains Fk-iix) which can be eliminated as it was done on the 
previous step. Using the relation Fk-2{x) = (^k-iF k-'iix) + Pk-ifk-7,{x) we get 

rk{x) = {6kx + 6k)rk-\{x) + ykPk-in-iix) + Jk^-k-ih-irk-iix) 
+ yka.k-\'y-k-2Fk^7,{x). 

Continue this process and noticing that Fq is the vector of zeros we will obtain the 
n-term recurrence relations 

rk{x) = {SkX + 9k)rk-i{x) + ykh-in-iix) + yk^-k-\Pk-irk-i{x) 

+ yk«k-iC(.k-2Pk-3''k-4{x) H h yk^k-[-- -diliiroix), (2.21) 

which define the matrix (2.18) with the desired generators by using the «-term 
recurrence relations (2.17). 

Theorem 3 ((//, m)-quasiseparable matrices ^ EGO-type two-term recur- 
rence relations.) Let A be an (H, m)-quasiseparable matrix specified by the 
generators {pk,qk,dk,gk,bk,hk}. Then the polynomial system R corresponding 
to A satisfies 
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Fo{x) 



roix) 



ao,o 



Fk{x) 



rk{x) 



Q-k 



Ik 



0i 



6kx + 9 k 



Fk-iix] 



Tk-lix) 



(2.22) 



with 



dk 






Pk+\ 



k = 



Pk+l 



T 

8k ^ 



Pk 

Pk+\qk 



-hi, 



1 

Pk+\qk 



Ok = 



dk 

Pk+\qk 



Proof It is easy to see that every system of polynomials satisfying deg rk — k 
(e.g. the one defined by (2.22)) satisfy also the M-term recurrence relations 



rk{x) = {aux -Ok-i.k) ■ rk-i{x) - Ok-i.k ■ n-iix) 



ao.k ■ rQ{x) (2.23) 



for some coefficients a.k,ak-i.k, ■ ■ ■,<^o.k- The proof is presented by showing that 
these «-term recurrence relations in fact coincide with (2.17), so these coefficients 
coincide with those of the n-term recurrence relations of the polynomials R. Using 
relations for rk{x) and Fk-\{x) from (2.22), we have 



nix) 



1 



Pk+X^k 



[(x - dk)rk-\{x) - gi,_^hkrk-2{x) + Pk-xhlbl_^Fk-2{x)\ ■ (2.24) 



Notice that again using (2.22) to eliminate Fk-iix) from the Eq. (2.24) will result 
in an expression for rk{x) in terms of rk-\{x), rk-2{x) , rk-3{x) , F k-i{x) , and ro(x) 
without modifying the coefficients of rk-i{x), rk-2{x), or ro(jf). Again applying 
(2.22) to eliminate F^^_3(.ii:) results in an expression in terms of r^._i(x), 
i'k-2{x),i'k-3{x),rii_4{x),Fi;_4{x), and ro(x) without modifying the coefficients of 
/"k-iix), i'i._2{x), '■a:-3(x), or ro(x). Continuing in this way, the n-term recurrence 
relations of the form (2.23) are obtained without modifying the coefficients of the 
previous ones. 

Suppose that for some 0<j<k ~ 1 the expression for r/^ (x) is of the form 
1 



rkix) 



Pk+xQk 



-[{x - dk)rk-i{x) - gk-\hkrk-2{x) 



Sj+ibJ+iMhrjix) + Pj+xhl{blifFj{x)] . 
Using (2.22) for Fj{x) gives the relation 

Fj{x) = ——(pjqjbjFj^i {x) - qjgjrj^i (x)) . 



(2.25) 



(2.26) 
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Inserting (2.26) into (2.25) gives 

fkix) = {x - dk)rk-i{x) - gi^_ihkrk_2{x) g:b-\hkrj_i{x) 

Pk+i^k^ '' 

+ Pjhl{bUMfFj^,{x)\. (2.27) 

Therefore since (2.24) is the case of (2.25) for j — k — 2, (2.25) is true for each 
j = k — 2,k — 3, . . .,0, and for j ~ 0, using the fact that Fq = we have 

rk{x) = {x - dk)rk-\{x) - g^^^hrk-iix) gib^^hi,ro{x) . (2.28) 

Pk+i<]k L ' J 

Since these coefficients coincide with (2.17) that are satisfied by the polynomial 
system R, the polynomials given by (2.22) must coincide with these polynomials. 
This proves the theorem. 

These last two theorems provide the proof for Theorem 1, and complete the 
discussion of the recurrence relations related to {H, m)-quasiseparable matrices. 



2.4 (H, ffi)-semiseparable Matrices and Szego-type 
Two-term Recurrence Relations (2.32) 

In this section we consider a class of {H, m)-semiseparable matrices defined next. 

Definition 3 ((//, m)-semiseparable matrices) A matrix A is called (//, m)- 
semiseparable if (i) it is strongly upper Hessenberg, and (ii) it is of the form 

A= B + triu{Au, 1) 

with rank(Af/) = m and a lower bidiagonal matrix B, where following the 
MATLAB command triu, triu{Au, 1) denotes the strictly upper triangular por- 
tion of the matrix Ay. 

Paraphrased, an {H, m)-semiseparable matrix has arbitrary diagonal entries, 
arbitrary nonzero subdiagonal entries, and the strictly upper triangular part of a 
rank m matrix. Obviously, an {H, m)-semiseparable matrix is {H, m)-quasisepa- 

rable. Indeed, let A be (//, m)-semiseparable and n x n. Then it is clear that, if Ajj 
denotes the matrix A 12 of the A:-th partition of Definition 1, then 

rank aJj = rank A(l : A:, ^ + 1 : n) = rank Af/(1 : A:, ^ + 1 : n)<m, 
k = l,...,n— 1, 

and A is (//, m)-quasiseparable by Definition 1. 

Example 3 (Unitary Hessenberg matrices are (//, l)-semiseparable). Consider 
again the unitary Hessenberg matrix 
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H 



Pop I 


-PoP-iPi 


-pQf^lH2P2 ■ ■ 


■ -Pot^lfi2f^2---^^n-lPn 


Ml 


-P\P2 


-P>2P3 


-p*/i2/i3---^(„_lP„ 


U 


t^2 


-P*2P3 


-P*2l^3---l^n-lPn 











A'„-l 



~Pn-lPn 



(2.29) 



which corresponds to a system of Szego polynomials. Its strictly upper triangular 
part is the same as in the matrix 



B 



(2.30) 

which can be constructed as, by definition, /i^ 7^ 0, ^ = 1, . . ., m — 1. It is easy to 
check that the rank of the matrix B is one.^ Hence the matrix (2.29) is {H, 1)- 
semiseparable. Recall that any unitary Hessenberg matrix (2.29) uniquely corre- 
sponds to a system of Szego polynomials satisfying the recurrence relations 



-PqPi 


-Pot^iPi 


-Pot^lf^2P3 ■ ■ 


■ -Pot^lt^2t^3---t^n-lPn 


P\P'i 


-P\P2 


-P*il-hP3 ■ ■ 


■ ~PUhP3---f^n-lP„ 


PxP'-, 

P\t'2 


PiP'i 
ft 


-P*2P3 


-~P*2f^3---Pn-lPn 


PiP',-) 

/'l/'2--7'„-i 


PiPl-^ 
ftft--7'„-i 


P3P',-) 


-Pl-lPn 



\ M^) 1 


1 


r 




r u^) ] 


1 


Uti^)\ 


N 


1 


5 


Uti^)\ 


i^k 



1 
~pk 



~pk 
1 






(2.31) 



The next theorem gives a classification of the class of {H, m)-semiseparable 
matrices in terms of two-term recurrence relations that naturally generalize the 
Szego-type two term recurrence relations. Additionally, it gives a classification in 
terms of their generators as in Definition 2. 

Theorem 4 Suppose A is a strongly upper Hessenberg n x n matrix. Then the 
following are equivalent. 

(i) A is (H, m)-semiseparable . 
(ii) There exists a set of generators of Definition 2 corresponding to A such that 

bk is invertible for k = 2, . . .,n. 
(iii) There exist auxiliary polynomials {Gk{x)} for some (ik^Pki ^^^ Ik of sizes m X 
m,m X \ and 1 x m, respectively, such that the system of polynomials {r]^{x)} 
related to A via (2.6) satisfies the Szego-type two-term recurrence relations 



^ The parameters ^j, associated with the Szego polynomials are defined by ;ij- = w 1 — \pj.\'' for 
< IpjI < 1 and jij, = 1 for |pj.| = 1, and since \p^.\ < 1 for all k, we always have p^- ^ 0. 
■* Every i-th row of B equals the row number (; — 1) times p,*_i/p*_9ft_i. 
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Go(x) 



ro{x) 



fflo.o 



flo.o 



-flO.O- 



Gk{x) 



rk{x) 



Oik 


Bk 








1 _ 


[\ 7* 



Gfc_i(a; 



_(4x + 0fc)rfc_i(x)_ 

(2.32) 



This theorem, whose proof follows from the results later in this section, leads to 
the following corollary, which summarizes the results for the simpler class of 
(//, l)-semiseparable matrices as given in [8]. 

Corollary 2 Suppose A is an (//, \)-quasiseparable matrix. Then the following 
are equivalent. 

(i) A is (H, \)-semiseparable . 
(ii) There exists a set of generators of Definition 2 corresponding to A such that 

bk 7^ Q for k = 2, . . .,n. 
(iii) There exist auxiliary polynomials {Gk{x)} for some scalars a^, ^j., and y^. 
such that the system of polynomials {rk{x)} related to A via (2.6) satisfies 
the Szego-type two-term recurrence relations 



[Go(-^)l 




flo.o 




\Gk{x)] 




'a* k] 


. ra{x) 




flo.o 


' 


[ rkix) \ 




W iJ 



Gk-\{x) 
{6kX + 0k)rk-\{x) 



(2.33) 



2.4.1 (H, m)-semiseparable Matrices: Generator Classification 

We next give a lemma that provides a classification of (//, m)-semiseparable 
matrices in terms of generators of an (//, m)-quasiseparable matrix. 

Lemma 2 An (//, m)-quasiseparable matrix is {H, m)-semiseparable if and only 
if there exists a choice of generators {pk,qk,dk,gk,bk,hii} of the matrix such that 
matrices bi are nonsingular for all k = 2, . . .,n — 1. 

Proof Let A be {H, m)-semiseparable with triu{A, 1) = triu{Au, 1), where 
rank(Af/) = m. The latter statement implies that there exist row vectors g,- and 
column vectors hj of sizes m such that Au{i,j) — gjhj for all ij, and therefore we 
have Ajj = gihj, i<j or Ay == gib^jhj, i<j with bj, = !„. 

Conversely, suppose the generators of A are such that bk are invertible matrices 
for ^ = 2, . . ., n — 1 . Then the matrices 



The invertibility of b^ implies that all b^ are square m x m matrices. 
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gibj'h 




if 1 < i <j < n 
if I <i —j<n 
if 1 <j <i<n 
if y = 1 or / = J 



B 



di ifl<i 
Piqj if 1 < i 
otherwise 



-j<n 



are well defined, rank{A^) ~ m,B is lower bidiagonal, and A = B + triu{Au, 1). 

Remark 7 We emphasize that the previous lemma guarantees the existence of 
generators of a (//, m)-semiseparable matrix with invertible matrices bk, and that 
this condition need not be satisfied by all such generator representations. For 
example, the following matrix 



1 


1 


1 


1 





1 


2 


2 


2 





1 


1 


3 


3 







1 


1 


4 









1 


1 
1 



1 



is (//, l)-semiseparable, however it is obviously possible to choose a set of gen- 
erators for it with b^ — 0. 



2.4.2 (H, m)-semiseparable Matrices. Recurrence Relations 
Classification 



In this section we present theorems giving the classification of (//, »j)-semisep- 
arable matrices as those corresponding to systems of polynomials satisfying the 
Szego-type two-term recurrence relations (2.32). 

Theorem 5 (Szego-type two-term recurrence relations => (//, m)-semiseparable 
matrices) Let R = {r(j{x), . . ., r„_i(x)} be a system of polynomials satisfying the 
recurrence relations (2.32) with rank{aij. — /Jj. ■y^.) = m. Then the (//, m)-semi- 
separable matrix A defined by 






Si 





Si 



xy„(a«-i - A,-iy„-i) • • • («! - /5i)'i)^o 



-3-y„(a«-i - Pn-iln-i) ■ ■ • (a2 - lhyi)Pi 



e„+yj„-i 
S„ 

Sn 



(2.34) 



2 Classifications of Recurrence Relations 

with generators 

, dk + IkPk-i , , 1 

ctfc = J , k = l,...,n, Pk+iqk = Y~ 
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k = 1, . . . ,n — 1, 



9k 



K-i 



bl 



«fc-i 



- 0k-i Ik-i \, fc = 2, ...,n- 1, 







"1" 






fl( 


= 




1 


ht 






1 








Tfc'. k- 



,n, 



corresponds to the R via (2.6). 

Proof Let us show that the polynomial system satisfying the Szego-type two-term 
recurrence relations (2.32) also satisfies EGO-type two-term recurrence relations 
(2.15). By applying the given two-term recursion, we have 



Gk{x) 
rk{x) 



a-kGk-\{x) + Pki^k + Ok)rk-\{x) 
IkGk-iix) + (<5t + Ok)rk-i (x) 



(2.35) 



Multiplying the second equation in (2.35) by Pi^ and subtracting from the first 
equation we obtain 

Gk{x) - Pk^kix) = (a* - hyk)Gk-\ {x). (2.36) 

Denoting in (2.36) Gk-\ by Fk and shifting indices from k io k— \ we get the 
recurrence relation 



Fk{x) = (a^-i - k-\yk-\)Fk-\{x) + Pk-irk-iix). 



(2.37) 



In the same manner substituting (2.36) in the second equation of (2.35) and 
shifting indices one can be seen that 

i-kix) = }'t(ai-i - Pk-\yk-\)Fk-\ {x) + {hx +0k + yklh-iYk-x {x). (2.38) 

Equations (2.37) and (2.38) together give necessary EGO-type two-term recur- 
rence relations for the system of polynomials: 



Fk{x) 
rk{x) 



yki^-k-i - Pk-u'k-i) ^kx + 0k + ytPk-i 



Fk-\{x) 
rk-\{x) 



(2.39) 



The result follows from Theorem 2 and (2.39). 
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Theorem 6 ((//, m)-semiseparable matrices => Szego-type two-term recurrence 
relations) Let A tea (H, m)-semiseparable matrix. Then for a set of generators 
{pk, IkTdkTgkTbkjhk} of A such that each bt is invertible, the polynomial system R 
corresponding to A satisfies (2.32); specifically , 



Gk{x] 



rk{x) 



1 




Vk 




T 

-9k+: 




Pk+lQk 


















1 




.1 i^mr' 1 



Gk-i(x] 



Uk{x)rk-i{x) 



(2.40) 



T/iT\-l 



with Uk (x) ^x-dk+ gkbk ^hk, Vk = Pk+iqkbk+i - gl+ih iK ) ■ 

Proof According to the definition of {H, m)-semiseparable matrices the given 
polynomial system R must satisfy EGO-type two-term recurrence relations (2.15) 
with bk invertible for all k. For the recurrence relations 



Fkix) 
rk{x) 



1 



Pk+iqk 



Pkqkbl 
Pkhl X - dk 



-IkSl 



Fk-\{x) 
rk-i{x) 



(2.41) 



let us denote /7^-|-if^(x) in (2.41) as Gk(x), and then we can rewrite these equations 
as 



Gk-\{x) = blGk-2{x) ~ glrk-i{x), 



rk{x) 



Pk+iqk 



[hlGk-iix) + (x - dk)rk-\{x)\ . 



(2.42) 



Using the invertibility of bk we are able to derive the Gk-iix) from the first 
equation of (2.42) and inserting it in the second equation we obtain 



rk{x) 



1 



Pk+iqk 



hk{bi^) Gk-i{x) + {x~ dk+gkbk hk)rk-\{x) 



(2.43) 



The second necessary recurrence relation can be obtained by substituting (2.43) in 
the first equation of (2.42) and shifting indices from k — \ io k. 



Gk{x) 



1 



Pk+iqk 



(pk+iqkbl+i - gl+ihl{bl) ')Gn(x) 
- sl+i {x-dk+ gkbk^hk)rk-i{x) 



(2.44) 



This completes the proof. 

This completes the justification of Theorem 4. 
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2.5 {H, ffi) -well-free Matrices and Recurrence 
Relations (2.45) 

In this section, we begin by considering the Z-term recurrence relations of the 
form 

k 
ro(x)=ao,o, rj:(x) = ^ ((5ax + £tt)r,-_i(x), ^ = 1,2, . . ., Z - 2, 

(2.45) 

rk{x) = ^ (8ikX + e.ik)ri-\ W, k = I ~ 1,1, . . .,n. 

i=k-l+2 

As we shall see below, the matrices that correspond to (2.45) via (2.6) form a new 
subclass of (//, m)-quasiseparable matrices. As such, we then can also give a 
generator classification of the resulting class. This problem was addressed in [8] 
for the 1 = 3 case; that is, for (2.3), 

ro(x) = ciofl, ri{x) = (aix ~ <5i) • ro{x), (2.46) 

rk{x) = {ockx - 3k) ■ rn (x) - (/?jx + yj • rk_2{x). (2.47) 

and was already an involved problem. To explain the results in the general 
case more clearly, we begin by recalling the results for the special case when 
1 = 3. 



2.5.1 General Three-term Recurrence Relations 
(2.3) and (H, l)-well-free Matrices 

In [8], it was proved that polynomials that satisfy the general three-term recurrence 
relations (2.46) were related to a subclassf {H, l)-quasiseparable matrices denoted 
{H, 1) -well-free matrices. A definition of this class is given next. 

Definition 4 ((//, l)-well-free matrices) 

• An n X n matrix A = (Ajj) is said to have a well of size one in column l<k<n 
if A, J. = for 1 < ! < A: and there exists a pair (ij) with 1 <i<k and k<j <n 
such that Aij ^ 0. 

• A {H, l)-quasiseparable matrix is said to be {H, l)-well-free if none of its 
columns k = 2, . . .,n — 1 contain wells of size one. 

Verbally, a matrix has a well in column k if all entries above the main diagonal 
in the ^-th column are zero, except if all entries in the upper-right block to the right 
of these zeros are also zeros, as shown in the following illustration. 
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^ 





something 
nonzero 


\ 


^ 



The following theorem summarizes the results of [8] that will be generalized in 
this section. 

Theorem 7 Suppose A is a strongly upper Hessenberg n x n matrix. Then the 
following are equivalent. 

(i) A is {H, \)-well-free. 
(ii) There exists a set of generators of Definition 2 corresponding to A such that 

ht ^ for k — 2, . . .,n. 
(iii) The system of polynomials related to A via (2.6) satisfies the general three- 
term recurrence relations (2.46). 

Having provided these results, the next goal is, given the /-term recurrence 
relations (2.45), to provide an analogous classification. A step in this direction can 
be taken using a formula given by Barnett in [4] that gives for such recurrence 
relations a formula for the entries of the related matrix. For the convenience of the 
reader, a proof of this lemma is given at the end of this section (no proof was given 
in [4]). 

Lemma 3 Let R = {ro(.x), . . ., r„_i(x)} be a system of polynomials satisfying the 
recurrence relations (2.45). Then the strongly Hessenberg matrix 



A = 



flu fli2 ai3 

j^ (322 «23 







I 



«33 



d,,-!,,,-! 



din 

a^n 



(2.48) 



with entries 



<5;7\<5/-l,,-l 



j-l 



+ £ij + ^ ai.dy 



300 



0, V/'; dij = Eij = 0, i<i 



(2.49) 



corresponds to R via (2.6). 
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Remark 8 While Lemma 3 describes the entries of the matrix A corresponding to 
polynomials satisfying the /-term recurrence relations (2.45), the structure of A is 
not explicitly specified by (2.49). Indeed, as surveyed in this section, even in the 
simplest case of generalized three-term recurrence relations (2.3), the latter do not 
transparently lead to the characteristic quasiseparable and well-free properties of 
the associated matrices. 



2.5.2 (H, m)-well-free Matrices 



It was recalled in Sect. 2.5.1 that in the simplest case of three-term recurrence 
relations the corresponding matrix was {H, l)-quasiseparable, and moreover, 
{H, l)-well-free. So, one might expect that in the case of /-term recurrence rela- 
tions (2.45), the associated matrix might turn out to be {H, I — 2) -quasiseparable, 
but how does one generalize the concept of {H, l)-well-free? The answer to this is 
given in the next definition. 

Definition 5 ({H, m)-well-free matrices) 

• Let A be an n X M matrix, and fix constants A:, m G [1 , « — 1] . Define the matrices 

Bf-"'^ =A(1 -.kj + k-.j + k+im- 1)), j^l,...,n~k~m. 



Then if for some j, 



rank(fif '"'+'' )>rank(Bf ■'")), 



the matrix A is said to have a well of size m in partition k. 

• A {H, m)-quasiseparable matrix is said to be (//, m)-well-free if it contains no 
wells of size m. 

One can understand the matrices B- '" of the previous definition as, for con- 
stant k and m and as j increases, a sliding window consisting of m consecutive 
columns. Essentially, the definition states that as this window is slid through the 
partition A^ of Definition 1, if the ranks of the submatrices increase at any point 
by adding the next column, this constitutes a well. So a {H, m) -well-free matrix is 
such that each column of all partitions A12 is the linear combination of the m 
previous columns of A 12. 
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Notice that Definition 5 reduces to Definition 4 in the case when m = I. Indeed, 
if m — 1 , then the sliding windows are single columns, and an increase in rank is 
the result of adding a nonzero column to a single column of all zeros. This is 
shown next in (2.50). 



B,-iB,B,+i 



(2.50) 



In order for a matrix to be {H, l)-quasiseparable, any column of zeros in A12 
must be the first column of A12; that is, in (2.50)y = 1. Thus a well of size one is 
exactly a column of zeros above the diagonal, and some nonzero entry to the right 
of that column, exactly as in Definition 4. 

With the class of (//, m)-well-free matrices defined, we next present a theorem 
containing the classifications to be proved in this section. 

Theorem 8 Suppose A is a strongly upper Hessenberg n x n matrix. Then the 
following are equivalent. 

(i) A is (H, m)-well-free. 

(ii) There exists a set of generators of Definition 2 corresponding to A such that 
bii are companion matrices for k — 2, . . .,n ~ \, and h^—ei for k = 
2, . . ., «, where e\ is the first column of the identity matrix of appropriate size. 
(iii) The system of polynomials related to A via (2.6) satisfies the general three- 
term recurrence relations (2.45). 

This theorem is an immediate corollary of Theorems 9, 10 and 11. 



2.5.3 (H, m)-w ell-free Matrices: Generator Classification 



Theorem 9 An [H, m)-quasiseparable matrix is (//, m)-well-free if and only if 
there exists a choice of generators {pk,qk, di(,gk,bk,hk} of the matrix that are of 
the form 



■•• ?,a 

10-.: 

1 ■. : : 

.0 •■• 1 <^,,„ 



A: = 2, . . .,n— I, hk 



, fe = 2,...,M. (2.51) 
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Proof Let A ~ (aij) be an {H, OT)-well-free matrix. Then due to the low rank 
property of off-diagonal blocks, its entries satisfy 



>-i 



2ij — / ^ ^Hs'^-sj-i 11 ^ "^J 



(2.52) 



It is easy to see that an (//, m) -well-free matrix B with 



dk — akk, k—\,...,n, Pk+ilk = ak+i,k, k=l, 

gk^ [ak.k+i-- -ak.k+m], ^ =!,...,«- 1, hk^[lO---0] 



M — 1, 

T 



■7* = 














<^k-m+\M 


1 














1 


















1 


^■k-l,k+l 
<^k,k+l 



k = 2. 



, « — 1. 



(2.53) 



coincides with A. 

Conversely, suppose A is an {H, m)-quasiseparable matrix whose generators 
satisfy (2.51). Applying (2.14) from Definition 2 it follows that 



6l"i.j"j 



Vij-i i= l,...,n j = i,...,i + m, 

Y.Vj~m ais^j-m,s-j+m+\ ! = 1, • • -, « 7 = / + m + 1 , . . ., M. 

(2.54) 



This is equivalent to a summation of the form (2.52), demonstrating the low-rank 
property, and hence the matrix A is (//, m) -well-free according to Definition 5. 

This result generalizes the generator classification of (//, l)-well-free matrices 
as given in [8], stated as a part of Theorem 7. 



2.5.4 (H, m)-well-free Matrices. Recurrence Relation 
Classification 



In this section, we will prove that it is exactly the class of (//, m) -well-free 
matrices that correspond to systems of polynomials satisfying /-term recurrence 
relations of the form (2.45) 

Theorem 10 (/-term recurrence relations => (//,/ — 2) -well-free matrices) Let 
A = (aij)"-^ be a matrix corresponding to a system of polynomials R = 
{ro(jt), . . .,r„_i(x)} satisfying (2.45). Then A is (H, m)-well-free. 
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Proof The proof is presented by demonstrating that A has a set of generators of 
the form (2.51), and hence is (//, m)-well-free. In particular, we show that 



dk = akk,k = l,...,n, pk+iqk 



1 



k — I, . . .,n — I, 



gk = [ak,k+i • ■ • '^k,k+i-2\ , /c = 1, ...,«- 1, 



bk 







hk 


— 


[10. 


-_2S]^ 


k 


— 2 


. ., n 






1-2 

























1 






















1 









(>k+l-4,k+I-2 
6k+l-2.k+l-2 


t 


k^ 


2 












1 


^k+l-?,k+I-2 
ik+l-2.k+l-2 . 









(2.55) 



with / = if ( > « — Z + 2 forms a set of generators of A. We show that with this 

choice, the entries of the matrix A coincide with those of (2.49). From Definition 2, 
the choice of li^ as the diagonal of A and choice ofpk+\qk as the subdiagonal entries 
of (2.48) produces the desired result in these locations. We next show that the 
generators gk , bk and hk define the upper triangular part of the matrix A correctly. 
Consider first the product gibi+ibi+2 ■ ■ ■ bi+t, and note that 



gibi+\bi+2 ■ ■ ■ bi+t = [a/,/+r+i • ■ • aij+t+i-i] ■ 



(2.56) 



Indeed, for f = 0, (2.56) becomes 



gi = [«/,/+! • ■ •«i,i+/-2j, 

which coincides with the choice in (2.55) for each ;, and hence the relation is true 
for f = 0. Suppose next that the relation is true for some t. Then using the lower 
shift structure of the choice of each bk of (2.55) and the formula (2.49), we have 



gibi+\bi+2 ■ ■ -bj+i+i = [fl;,;+/+i ■ • ■ ai.i+t+i-2]bi+t- 



■+i 



i+l+l-2 



And therefore 



E~'^ipOpJ+t+l-l 
= [^i,i+t+2 " • ■ Qi,i+r+/-lJ ■ 



gibiihj = [fly • • ■ aij+,^i]hj = Gij, j > i 



(2.57) 



so (2.55) are in fact generators of the matrix A as desired. 
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Theorem 11 ((//, m)-well-free matrices ^ [m + 2)-term recurrence relations) 
Let A be an (//, m) -well-free matrix. Then the polynomials system related to A via 
(2.6) satisfies the l-term recurrence relations (2.45). 

Proof By Theorem 9, there exists a choice of generators of A of the form 

dk = Vkfl,k^ l,...,n, pk+iqk^l-ik, /c =!,...,«- 1, 



gk = [VkA ■ ■ ■ Vk.„ 



1, 



,n- 1, 



bk = 



hk = [lO_-_OOf , k^2,...,n, 

m 





1 
1 

•■ 



(,k.m-l 
1 ^k,n 



2,...,«- 1. 



(2.58) 



We present a procedure to compute from these values the coefficients of (2.45). 
1. Take 



^ij- 






— j = m + 2,...,n i^j -m,..., j-1. 



(2.59) 



2. Calculate £y and 5ij for j — 2, .. .,m + I, i = I, . . .J ~ I as any solution of 
the following system of equations: 



Vij-i = -lii[£ij + TfsJiVu-i^y 



Vij-i = -HjUi-ijlii_^ + £y + X;i=l V/,,-;^. 



3. Find the remaining £y -coefficients using 



'■=1, 

j==2, ...,m+ 1, 

(' == 2, . . ., OT, 

j = /+ l,...,m+ 1. 



(2.60) 



''1.0 



''7 



^i-ijf^i-i - Ei=) v,>-,(5,; j = m + 2,...,n 

i^j~m,...J~ 1. 



(2.61) 



The proof immediately follows by comparing (2.55), (2.58) and using (2.49). Note 
that the coefficients of the /-term recurrence relations depend on the solution of the 
system of equations (2.60), which consists of 
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E 



m{m + 1) 



equations and defines m{m + 1) variables. So for the generators (2.58) of an 
{H, m) -well-free matrix there is a freedom in choosing coefficients of the recur- 
rence relations (2.45) for the corresponding polynomials. 

This completes the justification of Theorem 8 stated above. In the m—\ case, 
this coincides with the result given in [8], stated as Theorem 7. 



2.5.5 Proof of Lemma 3 

In this section we present a proof of Lemma 3, stated without proof by Bamett in [4]. 

Proof (Proof of Lemma 3) The results of [24] allow us to observe the bijection 
between systems of polynomials and dilated strongly Hessenberg matrices. Indeed, 
given a polynomial system R — {rci{x), . . .,r„^i{x)}, there exist unique n-term 
recurrence relations of the form 

X ■ r,-_i (x) = aj+ij ■ rj{x) + ajj ■ r,-_i (x) H h fli^- • ro(x), Oj+ij ^ 0, , . 

7= !,...,«- 1. ^ "^ ^ 

and aij, . . ., a,+ij are coefficients of they-th column of the correspondent strongly 
Hessenberg matrix A. 

Using djj = Ejj — 0,i<j — I + 2, we can assume that the given system of 
polynomials R = {ro(x), . . ., r„_i(x)} satisfies full recurrence relations: 

j 

oW = E (^y-^ + %)''i-i W> ;■ = 1, • • •, « - 1 (2.63) 



The proof of (2.49) is given by induction on 7. For any /, if 7 = 1, it is true that 
ail — — f^- Next, assuming that (2.49) is true for ally = I, . . .,k — 1. Taking y = k 
in (2.63) we can write that 



1 1 *"' 

xrk-iix) = j-rk{x) ~ -^rk-i{x) - j-^i^ikX + £*)r,-_i (x). (2.64) 

1=1 



Okk Okk Okk ' 



From the induction hypothesis and Eq. (2.62) we can substitute the expression for 
xr,_i into (2.64) to obtain 



1 e 1 ^"' 

xrk-i{x) = -^rk{x) - ^rn(x) ~ t- V 

Okk Okk Okk ~( 



i+\ 



^ik ^ fl«rs-i (x) + e,/tr,-i (x) 



(2.65) 
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After grouping coefficients in (2.65) we obtain 



xrk-i(x) = 



1 1 * 

(x) =—rk{x) -T-V 



^z 



i-\.k 



<5,-i,/-i 



k-i 
Eik + ^ fl«<5s, 



Comparing (2.62) and (2.66) we get (2.49) by induction. 



2.6 Relationship Between These Subclasses 
of (H, m)-quasiseparable Matrices 



r,_i(x). (2.66) 



Thus far it has been proved that the classes of {H, m)-semiseparable and {H, m)- 
well-free matrices are subclasses of the class of (//, m)-quasiseparable matrices. 
The only unanswered questions to understand the interplay between these classes 
is whether these two subclasses have common elements or not, and whether either 
class properly contains the other or not. 

It was demonstrated in [8] that there is indeed a nontrivial intersection of the 
classes of (//, l)-semiseparable and {H, l)-well-free matrices, and so there is at 
least some intersection of the (weakly) {H, m) versions of these classes. In the next 
example it will be shown that such a nontrivial intersection exists in the rank m 
case; that is, there exist matrices that are both (//, OT)-semiseparable and (//, m)- 
well-free. 

Example 4 Let A be an (//, m)-quasiseparable matrix whose generators satisfy 



eC" 



Regardless of the other choices of generators, one can see that these generators 
satisfy both Lemma 2 and Theorem 9, and hence the matrix A is both (//, m)-well- 
free and (//, ?M)-semiseparable. 

The next example demonstrates that an (//, m)-semiseparable matrix need not 
be (//, m)-well-free. 

Example 5 Consider the {H, m)-quasiseparable matrix 



A = 



"0 










r 






'1 


1 









1 












1 








1 


^ (prnxm 


hk = 












1 


1 






.0 



1 













1 


1 













1 


1 


1 










1 


1 













1 





52 T. Bella et al. 

Because of the shaded block of zeros, it can be seen that the matrix is not (//, 2)- 
well-free. However, one can observe that rank(fnM(A, 1)) = 2, and hence A is 
(//, 2)-semiseparable. Thus the class of (//, m)-semiseparable matrices does not 
contain the class of {H, m)-well-free matrices. 

To see that an {H, 7«)-well-free matrix need not be (//, m)-semiseparable, 
consider the banded matrix (2.8) from the introduction. It is easily verified to not 
be (//, m)-semiseparable (for m<n — I), however it is (//, / — 2)-well-free. 

This completes the discussion on the interplay of the subclasses of {H, m)- 
quasiseparable matrices, as it has been shown that there is an intersection, but 
neither subclass contains the other. Thus the proof of Fig. 2.3 is completed. 

2.7 Conclusion 

To conclude, appropriate generalizations of real orthogonal polynomials and 
Szego polynomials, as well as several subclasses of (//, l)-quasiseparable poly- 
nomials, were used to classify the larger class of (//, m)-quasiseparable matrices 
for arbitrary m. Classifications were given in terms of recurrence relations satisfied 
by related polynomial systems, and in terms of special restrictions on the quasi- 
separable generators. 
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Chapter 3 

Partial Stabilization of Descriptor Systems 

Using Spectral Projectors 



Peter Benner 



Abstract We consider the stabilization problem for large-scale linear descriptor 
systems in continuous- and discrete-time. We suggest a partial stabilization 
algorithm which preserves stable poles of the system while the unstable ones are 
moved to the left half plane using state feedback. Our algorithm involves the 
matrix pencil disk function method to separate the finite from the infinite gen- 
eralized eigenvalues and the stable from the unstable eigenvalues. In order to 
stabilize the unstable poles, either the generalized Bass algorithm or an algebraic 
Bernoulli equation can be used. Some numerical examples demonstrate the 
behavior of our algorithm. 



3.1 Introduction 

We consider linear descriptor systems 

E{Vx{t)) =Ax{t)+Bu{t), t>0, x(0)=xo, (3.1) 

where 'Dx{t) — j,x{t), f e M, for continuous-time systems and 'Dx{t) = x{t + 1), 
t e N, for discrete-time systems. Here, A,E e M'"'",B G K'""". We assume the 
matrix pencil A — aE to be regular, but make no assumption on its index. For 
continuous-time systems, (3.1) is a first-order differential-algebraic equation (DAE) 
if E is singular and an ordinary differential equation (ODE) if E is nonsingular. We 
are particularly interested in the DAE case. In this case, x{t) e K" is called a 
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descriptor vector which is in general not a state vector as some components may be 
chosen freely under certain conditions, see [30, Chap. 4]. The components of the 
vector u{t) e M'" are considered to be forcing functions or controls. For further 
properties of DAEs and descriptor systems see, e.g., [16, 19, 30] and references 
therein. 

Throughout this article we assume that the generalized eigenvalues of A — IE 
(the poles of the system corresponding to (3.1)) are given by 

A{A,E) = Ai U AaUJcx)} 

with Aj C Tj, j = 1,2. Here, Fi = C", Fa = C^ for continuous-time systems 
(with C"*^ denoting the open left and right half planes) and Fi = {|z| < 1}, F2 = 
{\z\ > 1} (the interior/exterior of the unit disk) for discrete-time systems. The case 
that eigenvalues are on the boundary of the stability region Fi can be treated as 
well, see Remark 2, but the dichotomy assumption with respect to 8F simplifies 
the presentation for now. 

Descriptor systems arise in various applications including circuit simulation, 
multibody dynamics, (semi-)discretization of the Stokes and Oseen equations 
(linearizations of the instationary Navier-Stokes equations) or the Euler equations, 
and in various other areas of applied mathematics and computational engineering, 
see, e.g., [30, 33] and references therein. 

The stabilization problem for (3.1) can be formulated as follows: choose u G 
^2(0, 00; R") such that the dynamical system (3.1) is asymptotically stable, i.e., 
solution trajectories satisfy lim,^oo x{t) — 0. Large-scale applications include 
active vibration damping for large flexible space structures, like, for example, the 
International Space Station [18], or initializing Newton's method for large-scale 
algebraic Riccati equations (AREs) [37]. But also many other procedures for 
controller and observer design make use of stabilization procedures [43]. 

For nonsingular E, it is well known (e.g., [20, 21]) that stabilization can be 
achieved by state feedback u — Fx, where F G W"^" is chosen such that the 
closed-loop system E{'Dx{t)) — {A — BF)x is asymptotically stable, i.e., 
\{A~BF,E) cTi iff the matrix pair {E-^A,E-^B) is stabilizable, i.e., 
rank([A - 1E,B]) = n for all a e C \ Fj. (In the following, we will call A{A,E) 
the open-loop eigenvalues and A(A — BF, E) the closed-loop eigenvalues.) 

For singular E, the situation is slightly more complicated. Varga [43] distin- 
guishes S- and /^-stabilization problems. Both require the computation of a state 
feedback matrix F such that the closed-loop matrix pencil A — BE — XE is regular. 
5-stabilization asks for F such that A(A — BE , E) contains exactly r = rank(£) 
stable poles while for /^-stabilization, all finite poles are requested to be stable. 
Both problems have a solution under suitable conditions (see [43] and references 
therein). Here, we will treat the /^-stabilization problem only for which the 
assumption of stabilizability of the matrix triple {E,A,B), i.e., rank([A — /iE,B]) = 
n for all finite i e C \ Fi guarantees solvability [19]. Our procedure can be useful 
when solving the ^-stabilization problem as well: the /^-stabilization is needed 
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after a preprocessing step in the procedure for ^-stabilization suggested in [43]. 
Also note that for the ^-stabilization problem to be solvable, one needs to assume 
strong stabilizability of the descriptor system (3.1) (see [43]), which is rarely 
encountered in practice. E.g., none of the examples considered in this paper has 
this property. 

In the standard ODE case {E nonsingular), a stabilizing feedback matrix F can 
be computed in several ways. One popular approach is to use pole assignment (see, 
e.g., [20, Sect. 10.4]) which allows to pre-assign the spectrum of the closed-loop 
system. There are several difficulties associated with this approach [25]. Also, for 
fairly large-scale problems, the usual algorithms are quite involved [20, Chap. 11]. 
In many applications, it is not necessary to fix the poles of the closed-loop system, 
in particular if the stabilization is only used to initialize a control algorithm or 
Newton's method for AREs. For this situation, a standard approach is based on the 
solution of a particular Lyapunov equation (see Proposition 2 and Eq. (3.3)). 
Solving this Lyapunov equation, the stabilizing feedback for standard state-space 
systems (i.e., E = I„) is F := B^X^ for continuous-time systems and F := 
B^ {EXE^ + BB^y A for discrete-time systems, where X is the solution of the 
(continuous or discrete) Lyapunov equation and M^ is the pseudo-inverse of M, 
see, e.g., [37]. This approach is called the Bass algorithm [2, 3]. For a detailed 
discussion of properties and (dis-)advantages of the described stabilization pro- 
cedure see [20, Sect. 10.2] and [37]. Both approaches, i.e., pole assignment and the 
Bass algorithm are generalized in [43] in order to solve the /^-stabilization prob- 
lem. Here, we will assume that achieving closed-loop stability is sufficient and 
poles need not be at specific locations, and therefore we will not discuss pole 
placement any further. 

For large-scale systems, often, the number of unstable poles is small (e.g., 5%). 
Hence it is often more efficient and reliable to first separate the stable and unstable 
poles and then to apply the stabilization procedure only to the unstable poles. For 
standard systems, such a method is described in [26] while for generalized state- 
space systems with invertible E, procedures from [43] and [8] can be employed. 
The approach used in [8] is based on the disk function method and only computes 
a block-triangularization of the matrix pencil A — ?.E which has the advantage of 
avoiding the unnecessary and sometimes ill-conditioned separation of all eigen- 
values required in the QZ algorithm for computing the generalized Schur form 
A ~ aE which is employed in [43]. The main contribution of this paper is thus to 
show how the disk function method can be used in order to generalize the method 
from [8] to descriptor systems with singular E. 

In the next section, we will give some necessary results about descriptor 
systems and stabilization of generalized state-space systems with nonsingular 
matrix E. Section 3.3 then provides a review of spectral projection methods and in 
particular of the disk and sign function methods. A partial stabilization algorithm 
based on the disk function method is then proposed in Sect. 3.4. In Sect. 3.5, we 
give some numerical examples demonstrating the effectiveness of the suggested 
approach. We end with some conclusions and an outlook. 
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3.2 Theoretical Background 

As we will only discuss the 7?-stabilization problem, we will need the following 
condition. 

Definition 1 Let {E,A, B) be a matrix triple as in (3.1). Then the descriptor system 
(3.1) is stabilizable if 

rank{[A - XE, B]) = n for all A e C \ T, . 

We say that the system is c-stabilizable if Fi — C" and d-stabilizable if 
ri = {zeC||z|<l}. 

The following result from [19] guarantees solvability of the /^-stabilization 
problem. 

Proposition 1 Let the descriptor system (3.1) be stabilizable, then there exists a 
feedback matrix F G Jj'"^" such that 

A{A-BF,E) C Fi U{cx)}. 

In the following, we will speak of (partial) stabilization and always mean R- 
stabilization. 

If E is nonsingular, any method for stabilizing standard state-space systems can 
be generalized to this situation. Here, we will make use of generalizations of the 
Bass algorithm [2, 3] and an approach based on the (generalized) algebraic 
Bernoulli equation (ABE) 

A^XE + E^XA - E^XBB^XE = 0. (3 .2) 

The first approach is based on the following result which can be found, e.g., in [8, 43] . 
Proposition 2 Let {E,A,B) as in (3.1) be stabilizable with E nonsingular. If 

F := B^E-^Xl or F := B'^ {EX^E^ + Bfi^)U (3.3) 

for continuous- or discrete-time systems, respectively, where X^ and Xj are the 
unique solutions of the generalized (continuous or discrete) Lyapunov equations 

{A + P,E)XE'^ + EX{A + li,Ef ^2BB'^ or AXA^ - fi^EXE^ = 2BB^ , {3 A) 

respectively, for /?^ > maxxeMA,E){-'ReW},0< Pd< "^^^AeA{A,E)\{0]{\M}, then 
A — BE — XE is stable. 

An often used, but conservative upper bound for the parameter /?^ in the con- 
tinuous Lyapunov equations above is HZs^'AH for any matrix norm. As we will 
apply Proposition 2 in our partial stabilization procedure only to a matrix pencil 
that is completely unstable, i.e., A(A,£) C r2, we can set /?^. = and ^,( = 1. 
Usually, it turns out to be more effective to set /J^ > as this yields a better 
stability margin of the closed-loop poles. Note that /?^ (or jS^ in the discrete-time 
case) serves as a spectral shift so that the stabilized poles are to the left of — /^^, (or 
inside a circle with radius /J^). 
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Stabilization using the ABE (3.2) can be used for continuous-time systems and 
is based on the following result [6]. 

Proposition 3 If {E,A,B) is as in Proposition 2 and A(A,£') n jK = 0, then the 
ABE (3.2) has a unique c-stabilizing positive semidefinite solution Z+, i.e., 
A{A~BB^X+E,E) C C". 

Moreover, rank(X+) = fi, where /i is the number of eigenvalues of A — XE in 

C^ and 

K{A-BB^X+E,E) = (A(A,£)nC")u(-(A(A,£)nC+)). 

Another possibility for stabilization of standard discrete-time systems {E invertible 
in (3.1)) is discussed in [22]. An extension of this method to the case of singular E 
would allow the stabilization of large-scale, sparse discrete-time descriptor 
systems. This is under current investigation. 

3.3 Spectral Projection Methods 

In this section we provide the necessary background on spectral projectors and 
methods to compute them. These will be the major computational steps required in 
the partial stabilization method described in Sect. 3.4. 

3.3.1 Spectral Projectors 

First, we give some fundamental definitions and properties of projection matrices. 

Definition 2 A matrix P e K"^" is a projector (onto a subspace S C M") if 
range (P) = S and P^ = P. 

Definition 3 Let Z, 7 e M'""' be a regular matrix pencil with A(Z, F) = Ai U A2, 
Ai n A2 = 0, and let S\ be the (right) deflating subspace of the matrix pencil 
Z — aY corresponding to Ai. Then a projector onto S\ is called a spectral 
projector. 

From this definition we obtain the following properties of spectral projectors. 

Lemma 1 Let Z — XY be as in Definition 3, and let P e M"^" be a spectral 
projector onto the right deflating subspace of Z — }.Y corresponding to Ai. Then 

a. rank(/') = |Ai| := yfe, 

b. ker (P) = range (/ - P), range (P) = ker (/ - P), 

c. I ~ P is a spectral projector onto the right deflating subspace of Z — XY cor- 
responding to A2. 

Given a spectral projector P we can compute an orthogonal basis for the cor- 
responding deflating subspace S\ and a spectral or block decomposition of Z — IF 
in the following way: let 
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be a QR decomposition QR decomposition with column pivoting (or a rank- 
revealing QR decomposition (RRQR)) [24], where n is a permutation matrix. 
Then the first k columns of V form an orthonormal basis for Si. If we also know 
an orthonormal basis of the corresponding left deflating subspace and extend this 
to a full orthogonal matrix U G M"^", we can transform Z, Y to block-triangular 
form 



Z-IY := U'^{Z- aY)V = 



'Zu 


Zn 


— /i, 


'I'll 


Yxi' 





Z22. 







Y22_ 



(3.5) 



where A(Z„, y,i) = Ai, A(Z22, F22) = A2. 

Once V is known, the orthogonal matrix U can be computed with little effort 
based on the following observation [42]. 

Proposition 4 Let Z ~ lY ^ ^nxn ^^ ^ regular matrix pencil with no eigenvalues 
on the boundary BFi of the stability region Ty. If the columns ofV\ G C^''"^ form 
an orthonormal basis of the stable right deflating subspace ofZ — kY, i.e., the 
deflating subspace corresponding to A(Z, Y) CiTx, then the first n^ columns of the 
orthogonal matrix U in the QR decomposition with column pivoting. 



URW = U 



Ru 




^12 




n^ = [zvi YVi], 



(3.6) 



form an orthonormal basis of the stable left deflating subspace of Z — ).Y. 



The matrix U from (3.6) can then be used for the block-triangularization in (3.5). 

The block decomposition given in (3.5) will prove extremely useful in what 
follows. Besides spectral projectors onto the deflating subspaces corresponding to 
the finite and infinite parts of the spectrum of a matrix pencil, we will also need 
those related to the stability regions in continuous- and discrete-time. 

Definition 4 Let Z, F e R"""" with A(Z, F) = Ai U A2, Ai n A2 = 0, and let Si 
be the (right) deflating subspace of the matrix pencil Z — aY corresponding to Ai. 
Then S\ is called 

a. c-stable if Ai C C^ and c-unstable if Ai C C^; 

b. d-stable if Ai C {|z| < 1} and d-unstable if Ai C {|z| > 1}. 



3.3.2 The Matrix Sign Function 



The sign function method was first introduced by Roberts [36] to solve algebraic 
Riccati equations. The sign function of a matrix Z e R"^" with no eigenvalues on 
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the imaginary axis can be defined as follows: Let Z — S 



J 
/+ 



S denote the 



Jordan decomposition of Z where the Jordan blocks corresponding to the, say, k 
eigenvalues in the open left half plane are collected in J and the Jordan blocks 
corresponding to the remaining n — k eigenvalues in the open right half plane are 
collected in /+. Then 



sign(Z) := S 



-h 

In-k 



s-\ 



The sign function provides projectors onto certain subspaces of the matrix Z : 
V^ :— \{I„ — sign(Z)) defines the oblique projection onto the c-stable Z-invariant 
subspace along the c-unstable Z-invariant subspace whereas V^ := \{I„ + 
sign(Z)) defines the oblique projection onto the c-unstable Z-invariant subspace 
along the c-stable Z-invariant subspace. Therefore, the sign function provides a 
tool for computing a spectral decomposition with respect to the imaginary axis. As 
we will see in the following subsection, the sign function method can be applied 
implicitly to Y^^Z so that also the corresponding spectral projectors for matrix 
pencils Z ~ "aY with invertible Y can be computed. This can be used for contin- 
uous-time stabilization problems once the infinite eigenvalues of A — IZs have 
been deflated. 



333 Computation of the Sign Function 

The sign function can be computed via the Newton iteration for the equation 
Z- = /„ where the starting point is chosen as Z, i.e., 

Zo^Z, Zj+i^^iZj + Zr'), ; = 0,1,... (3.7) 

Under the given assumptions, the sequence {Zj}°^q converges to sign(Z) = 
limj^oc Zj [36] with an ultimately quadratic convergence rate. As the initial con- 
vergence may be slow, the use of acceleration techniques is recommended; e.g., 
determinantal scaling [17] adds the following step to (3.7): 

Zj^-Zj, 9=|det(Z,.)|", (3.8) 

where det(Z,) denotes the determinant of Zj. For a summary of different strate- 
gies for accelerating the convergence of the Newton iteration, see [28]. It should 
be noted that eigenvalues close to the imaginary axis may defer convergence 
considerably with stagnation in the limiting case of eigenvalues on the imaginary 
axis. 
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Algorithm 1 Sign function method. 

INPUT: A matrix pencil Z — XY, Z,Y e M"^" with no eigenvalues on tiie imaginary 

axis. 
OUTPUT: OMque spectral projectors V^ and P+ onto the c-stableand c-unstable, 
respectively, deflating subspaces of Z — XY. 
1: Set Zo = Z,Yo = Y. 

{Newton iteration} 
2: for j = 0, 1, . . . until convergence do 
3: Zj = n'^LU 

{LU factorization: L/U lower/upper triangular, U permutation matrix}, 

4: Cj =nLi l"fcfcl"' 

5: Solve LW = UY by forward substitution, 

6: Solve UX — W hy backward substitution, 

7: Z, + i = ^Z, + frX, 

8: s = 3 + 1. 

9: end for 

10: Set V- := Z^ - Y, V+ := Z^ + Y. 



In our case, we will have to apply the sign function method to a matrix pencil 
rather than a single matrix. Therefore, we employ a generalization of the matrix 
sign function method to a matrix pencil Z — }.Y given in [23]. Assuming that Z and 
Y are nonsingular, the generalized Newton iteration for the matrix sign function is 
given by 

Zo^Z, Zj+,^^{Zj + c]YZ.'Y), 7-0,1,..., (3.9) 



2c, 



with the scaling now defined as Cj ^- ( detfy) )"• "^^^^ iteration is equivalent to 

computing the sign function of the matrix F^'Z via the Newton iteration as given 
in (3.7). If lim^^oc 2, = Z^, then Z^o — Y defines the oblique projection onto the 
c-stable right deflating subspace of Z — IF along the c-unstable deflating sub- 
space, and Zoo + Y defines the oblique projection onto the c-unstable right 
deflating subspace of Z — XY along the c-stable deflating subspace. 

As a basis for the c-stable invariant subspace of a matrix Z or the c-stable 
deflating subspace of a matrix pencil Z — 17 is given by the range of any projector 
onto this subspace, it can be computed by a RRQR factorization of the 
corresponding projectors V^ or Z^a — Y, respectively. 

A formal description of the suggested algorithm for computing spectral pro- 
jectors onto the c-stable and c-unstable deflating subspaces of matrix pencils is 
given in Algorithm 1. The necessary matrix inversion is realized by LU decom- 
position with partial pivoting and forward/backward solves. Note that no explicit 
multiplication with the permutation matrix 11 is necessary — this is realized by 
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swapping rows of X or columns of Y. Convergence of the iteration is usually based 
on relative changes in the iterates Zj. 

For block-triangularization related to spectral division with respect to the 
imaginary axis, matrices U, V as in (3.5) can be obtained from a RRQR factor- 
ization of V^ and Proposition 4. An explicit algorithm for this purpose is provided 
in [42]. As we will see later, for our purposes it is not necessary to form U 
explicitly, just the last n — rii columns of U need to be accumulated. We will come 
back to this issue in the context of the disk function method in more detail; see Eq. 
(3.10) and the discussion given there. 

Much of the appeal of the matrix sign function approach comes from the high 
parallelism of the matrix kernels that compose the Newton-type iterations [34]. 
Efficient parallelization of this type of iterations for the matrix sign function has 
been reported, e.g., in [4, 27]. An approach that basically reduces the cost of the 
generalized Newton iteration to that of the Newton iteration is described in [41]. 
An inverse-free version of the generalized sign function method which is about 1.5 
times as expensive as Algorithm 1 is discussed in [10]. 

Unfortunately, the matrix sign function is not directly applicable to descriptor 
systems. If Y {— E in our case) is singular, then convergence of the generalized 
Newton iteration (3.9) is still possible if the index is less than or equal to 2 [41], 
but convergence will only be linear. Moreover, as we do not want to restrict 
ourselves to descriptor systems with low index, we will need another spectral 
projection method that can be computed by a quadratically convergent algorithm 
without restrictions on the index. The disk function method described in the fol- 
lowing subsection will satisfy these demands. 



3.3.4 The Matrix Disk Function 

The right matrix pencil disk function can be defined for a regular matrix pencil 
Z~ aY,Z,Y (^ m."'"", as follows: Let 



Z~ ).Y = S 



r - ih 

7°° - XN 



denote the Kronecker (Weierstrass) canonical form of the matrix pencil (see, 
e.g., [30] and references therein), where 7° eC^^'^J'^ ^^(n-k)y(n-k) ^.Q^t^jn^ 
respectively, the Jordan blocks corresponding to the eigenvalues of Z — lY 
inside and outside the unit circle. Then, the matrix (pencil) disk function is 
defined as 



disk(Z, Y):^T 



h 




1 





In-k 



70" — )SP° 



Alternative definitions of the disk function are given in [9] . 
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The matrix pencil disk function can be used to compute projectors onto 
deflating subspaces of a matrix pencil as V^ is an oblique projection onto the 
right d-stable deflating subspace of Z — 17 along the d-unstable one, and V°" 
defines an oblique projection onto the right d-unstable deflating subspace of 
Z — aY along the d-stable one. Thus, the disk function provides a tool for 
spectral decomposition along the unit circle. Splittings with respect to other 
curves in the complex plane can be computed by applying a suitable conformal 
mapping to Z — IF [5]. As we want to use the disk function in order to compute 
a projector onto the deflating subspace corresponding to the infinite or finite 
eigenvalues, we will need a curve enclosing all finite eigenvalues. We will come 
back to this issue in Sect. 3.4. 

In the next subsection we discuss how the disk function can be computed 
iteratively without having to invert any of the iterates. 



3.3.5 Computing the Disk Function 

The algorithm discussed here is taken from [5], and is based on earlier work by 
Malyshev [31]. This algorithm is generally referred to as inverse-free iteration. We 
also make use of improvements suggested in [42] to reduce its cost. 

Given a regular matrix pencil Z — lY having no eigenvalues on the unit circle. 
Algorithm 2 provides an implementation of the inverse-free iteration which 
computes an approximation to the right deflating subspace corresponding to the 
eigenvalues inside the unit circle. It is based on a generalized power iteration (see 
[7, 42] for more details) and the fact that (see [7, 31]) 

lim {Zj + YjY^Yj ^ V\ lim {Zj + YjY^Zj = V°° . 

Convergence of the algorithm is usually checked based on the relative change 
in Rj. Note that the QR decomposition in Step 1 is unique if we choose positive 

diagonal elements as [Yf, ~^/] ^^^ ^^^^ '"^'^'^ ^'^ ^^1 steps [24]. 

The convergence of the inverse free iteration can be shown to be globally 
quadratic [5] with deferred convergence in the presence of eigenvalues very close 
to the unit circle and stagnation in the limiting case of eigenvalues on the unit 
circle. Also, the method is proven to be numerically backward stable in [5]. Again, 
accuracy problems are related to eigenvalues close to the unit circle due to the fact 
that the spectral decomposition problem becomes ill-conditioned in this case. 

The price paid for avoiding matrix inversions during the iteration is that every 
iteration step is about twice as expensive as one step of the Newton iteration (3.9) 
in the matrix pencil case and three times as expensive as the Newton iteration (3.7) 
in the matrix case. On the other hand, the inverse free iteration can be implemented 
very efficiently on a parallel distributed-memory architecture like the sign function 
method since it is based on matrix multiplications and QR factorizations which are 
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Algorithm 2 Inverse Free Method. 


INPUT: A matrix pencil Z - \Y , Z,Y e M"^" with no eigenvalues on the miit 
circle. 


OUTP 
1: Set 
2: for 

3: 


UT: T 
Zo = 
J=0 

Yj' 

-z, 


he ma 
Z, Yo 

,1,... 


trix pencil disk function of Z — XY. 
= Y. {Inverse free iteration} 
until convergence do 

'Uu U12] rp.i , 

(OR factorization), 

U2I [/22 [ J 


4 
5 
6 

7 
8 


Zj + l — U 12^3 1 

a = j + 1. 
end for 

Set disk{Z,Y) := (Z,, + K,)-^(r, - AZ,,). 



well studied problems in parallel computing; see, e.g., [24, 35] and the references 
given therein. Also, a recently proposed version yields a reduced cost per iteration 
step which leads to the conclusion that usually, the inverse-free iteration is faster 
than the QZ algorithm if less than 30 iterations are required — usually, convergence 
can be observed after only 10-20 iterations. See [32] for details. 

It should be noted that for our purposes, neither the disk function nor the 
projectors T^ nor 7-*°° need to be computed explicitly. All we need are the related 
matrices Q,Z from (5). This requires orthogonal bases for the range and nullspace 
of these projectors. These can be obtained using a clever subspace extraction 
technique proposed in [42] . The main idea is here that a basis for the range of 7^" 
can be obtained from the kernel of V°". This can be computed from Z, directly, 
i.e., Zs + Ys is never inverted, neither explicitly nor implicitly. The left deflating 
subspace is then computed based on Proposition 4. The complete subspace 
extraction technique yielding the matrices U, V as in (3.5) with A(Z22, Y22) being 
the d-stable part of the spectrum of Z — AY can be found in Algorithm 3. 

Note that the triangular factors and permutation matrices in Algorithm 3 are 
not needed and can be overwritten. For our purposes, we can save some 
more workspace and computational cost. It is actually sufficient to store V (which 
is later on needed to recover the feedback matrix of the original system) and to 
compute 



Z22 - U^ZV2, Y22 - UIYV2. 



(3.10) 



This can be exploited if an efficient implementation of the QR decomposition like 
the one in LAPACK [1] is available. Accumulation of U2 only is possible there, so 
that only ^nr{n — r) flops are needed rather than |«^ when accumulating the full 
matrix U. Moreover, instead of Err' flops as needed for the four matrix products in 
(3.5), the computations in (3.10) require only 4nr{n + r) flops. 
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Algorithm 3 Subspace Extraction for Disk Function Method. 

INPUT: A matrix pencil Z-\Y Z,Y eR"^" witli no eigenvalues on the unit circle, 

Zs as computed by Algorithm 2, and a tolerance r for rank detection. 
OUTPUT: Matrices U, V upper triangularizing Z - XY so that with 



Zll Z\2 


-A, 


■riiri2" 


[ Z22J 




[ ^22 J 



U^{Z -\Y)V ■■ 



we have yl (Zll, Fii) C {^ e C | |z| < 1}, 71(222,122) C {2 e ' 

1: {Compute range ofV , i.e., nullspace of Zs.} 
Compute the RRQR 



|2|>1}. 



'V2 Vi] 



Ri 




n^ 



where the partitioning is determined with respect to the numerical rank r — rir) 

of Z] based on r and the columns of Vi G jgrixn-r £qj.j)^ j^j^ orthonormal basis 

for ker Zs . 
2: Set V := [Vi V2]. 
3: {Compute basis for the left deflating subspaces.} 

Compute the QR decomposition with column pivoting 

urn^ = [Ui U2]Tn^ = [zvi yVi] , 

with the same partitioning with respect to r{T) as above. 



3.4 Partial Stabilization Using Spectral Projection 



For the derivation of our algorithm, we will assume that the descriptor system (3.1) 
has nj finite and Mqo infinite poles, i.e., A{A,E) = AjU{oo} with \Af\ —rif. 
Moreover, we assume that there are n\ stable and no unstable poles, i.e., 
A(A,£') = Ai U A2, where Ai C Fi, A2 C C \ Fi, and |Ai| — ni, IA2I = n2 + riao- 
This yields the relations n — nj + rice, = "i + "2 + Mqo- Our partial stabilization 
algorithm based on the disk function method will consist of the following steps: 

1. Deflate the infinite poles of the system using a spectral projector onto the 
corresponding deflating subspace of A — XE computed by the disk function 
method. We call the resulting nj x nf matrix pencil Ai — aEi. Note that now, 
£1 is nonsingular. 

Transform B accordingly, yielding B^. 

2. Deflate the stable finite poles of the system using a spectral projector onto the 
deflating subspace of Ai — lEi. This can be computed by the disk (sign) 
function method applied to {Ai,Ei) in the discrete-time (continuous-time) case 
or to the Cayley-transformed matrix pair (Ai + Ei,Ai — Ei) in the continuous- 
time (discrete-time) case. We call the resulting «2 x «2 matrix pencil A2 — AE2. 
Note that now, A2 — AE2 has only unstable eigenvalues. 

Transform Bi accordingly, yielding 62. 
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3. Solve the stabilization problem for the generalized state-space system 

E2{Vx2{t)) = A2x{t)+B2u{t), t > 0, X2(0) = .T2,0. 

4. Assemble the stabilizing feedback for the original system (1) using either 
Propositions 2 or 3. 

A possible implementation, based only on the disk function method, is sum- 
marized in Algorithm 4. An analogous implementation using the sign function 
method in Step 2 can easily be derived replacing Algorithm 2 by Algorithm 1 
there (and changing the order of the if-else conditions). 



Algorithm 4 Partial Stabilization Using the Disk Function Method. 

INPUT: A stabilizable descriptor system as in (1) with A — XE regular; a so that a 
circle around the origin with radius 1/a encloses all finite eigenvalues of ^ — XE. 

OUTPUT: A stabilizing feedback matrix F e R""'". 
1: Apply the disk function method to {E, aA) to obtain a spectral projector onto 
the deflating subspace corresponding to the finite eigenvalues of A — XE and 
use Algorithm 3 to compute an orthogonal equivalence transformation to block- 
triangular form such that A {A, E) is divided into 



Qi(A - XE)Z^ 



Aao Al2 


-A 


£-00 E\2 


Ai 




Fi 



where A {A^o , E, 



= {oo}, A(Ai,Ei) is finite, Ei nonsingular, and partition 
accordingly. 



2: if the system is continuous-time, 

apply Algorithm 2 to (Ai + Ei,Ai - Ei) 
else 

apply Algorithm 2 to (Ai , Ei ) 
endif 

in order to compute an orthogonal equivalence transformation to block- 
triangular form such that A{Ai,Ei) is divided according to the boundary of 
the stability region, i.e., 



QiiAi - XEi)Z2 



Astab * 


-A 


Estab * 


A2 




E2 



where /I (Astab, Sstab) C A, A{A2,E2) C r2, and partition Q2B1 



Sstab 
B2 



accordingly. 

Compute F2 using either Proposition 2 applied to (A2, E2, B2) or Proposition 3 
applied to ( A2 , £2 , B2 ) in the continuous-time case and to ( A2 + i?2 , A2 — £2 , B2 ) 
in the discrete-time case. 

Set F:= [0, [0F2]Z^]ZT. 
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The determination of the input parameter a in Algorithm 4 is in itself a 
non-trivial task. Often, knowledge about the spectral distribution can be obtained 
a priori from the physical modeling. It is an open problem how to determine a 
directly from the entries of the matrices A,E. In principle, the generalized 
Gershgorin theory derived in [38] provides computable regions in the closed 
complex plane containing the finite eigenvalues. In practice, though, these regions 
often extend to infinity and thus provide no useful information. Recent develop- 
ments in [29] give rise to the hope that computable finite bounds can be obtained; 
this will be further explored in the future. 

The solution of the matrix equations needed in Step 3 of Algorithm 4 (either 
one of the Lyapunov equations from (3.4) or the ABE (3.2)) can be obtained in 
several ways, see, e.g., [20, 37] for a discussion of different Lyapunov solvers. As 
we base our partial stabilization algorithm on spectral projection methods like the 
sign and disk function methods under the assumption that these are particularly 
efficient on current computer architectures, it is quite natural to also use the sign 
function methods for the generalized Lyapunov and Bernoulli equations in (3.4) 
and (3.2). Efficient algorithms for this purpose are derived and discussed in detail 
in [6, 13, 15]. Note that the matrix pencils defining the Lyapunov operators in (3.4) 
have all their eigenvalues in ¥2- Thus the methods from [13, 15] can be applied to 
these equations. Our numerical results in Sect. 3.5 are all computed using these 
sign function based matrix equation solvers. 

Remark 1 Due to the usual ill-conditioning of the stabilization problem, it turns 
out that sometimes the computed closed-loop poles are not all stable. In that case, 
we suggest to apply Steps 2-3 of Algorithm 4 again to (A2 — 52^2: E2), resulting 
in a feedback matrix 

F:^ [0, [0, F2 + [0, Fi]Z^]Z^]zl 

This is often sufficient to completely stabilize the system numerically, see 
Example 2. Otherwise, Steps 2-3 should be repeated until stabilization is 
achieved. 

Remark 2 So far we have assumed spectral dichotomy with respect to the 
boundary curve dT of the stability region. For Step 1 of Algorithm 4, this is not 
necessary. It becomes an issue only in the following steps, but can easily be 
resolved. For the spectral decomposition with respect to the boundary of the 
stability region performed in Step 2, we can simply shift/scale so that the eigen- 
values on 8r are moved to r2. This may also be advisable if stable eigenvalues are 
close to the boundary of the stability region and should be made "more stable". 
This basic idea is implemented, for instance, in the Descriptor System and 
Rational Matrix Manipulation Toolbox [44]. For Algorithm 4 this means that in 
Step 2, the spectral decomposition is computed for (Aj + acEi,Ei) (a^. > 0) in the 
continuous-time case and for (Ai, a,(£i) (0<a^< 1) in the discrete-time case. The 
resulting matrix pencil (A2,£2) will then again have eigenvalues on 6r (or stable 
ones close to it). This can be treated in Step 3 again with shifting/scaling: when 
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applying Proposition 2, simply set /J^ > or jS^ < 1 (as already advised in Sect. 
3.2), while if Proposition 3 is to be used, it is applied to (A2 + acE2,E2,B2) with 
Xc > 0. (Note that Proposition 3 applies only in the continuous-time case.) 

In the following section, we will test the suggested partial stabilization method 
as presented in Algorithm 4 for several problems, in particular for stabilization 
problems for linear(ized) flow problems. 



3.5 Numerical Examples 

In order to demonstrate the effect of the partial stabilization method on the poles of 
the system (3.1), we implemented the generalized Bass and Bernoulli versions of 
Algorithm 4 as Matlab functions. For the solution of the Lyapunov and Bernoulli 
equations in (3.4) and (3.2) we employ Matlab implementations of the sign 
function based solvers described in detail in [6, 13, 15]. 

Note that there is no software for partial stabilization to compare to as even the 
Matlab function gstab from the Descriptor System and Rational Matrix 
Manipulation Toolbox' (Descriptor Toolbox for short) [44] only treats systems 
with invertible E. Nevertheless, we compare our results with those obtained by 
gstab applied to the projected problem resulting from Step 1 of Algorithm 4. 
Moreover, we tried to apply the pole placement function gplace from the 
Descriptor Toolbox on each level, i.e., for the full descriptor system and the 
projected systems after Steps 1 and 2 of Algorithm 4. All computations were 
performed using Matlab release R2006a with IEEE double precision arithmetic 
on Windows XP notebooks with Pentium M CPU and running either at 1.13 GHz 
with 512 Mb of main memory or 2.13 GHz with 1 Gb of main memory. 

Example 1 In order to be able to distinguish all eigenvalues in the plots, we first use 
a small scale example with n — 20, m = 3. Therefor, a diagonal matrix pencil with 
ten infinite eigenvalues and finite spectrum {—4.5, —3.5, . . ., —0.5, 5.5, . . .9.5} is 
generated. Thus, five unstable poles have to be stabilized. Our algorithm computes 
a feedback matrix with moderate norm: \\F\\2 ~ 148 using the generalized Bass 
stabilization and \\F\\2 ~ 232 when using the algebraic Bernoulli equation. Thus, 
the norm of the gain is quite reduced in the generalized Bass case as \\Fi jjj « 4, 490 
if applied to the system described by (Ai,Bi,£i). Note that in the Bernoulli case, 
there is no reduction in the norm of the gain matrix which can be expected as in an 
appropriate basis, the solution Xi for the larger problem is zero except for the 
diagonal block in the lower right corner which is the Bernoulli solution of the small 
fully unstable system. The full-scale feedback matrix results from applying 
orthogonal transformations only to the small- size feedback so that the 2-norm is not 
changed. The function gplace from the Descriptor System and Rational Matrix 
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Manipulation Toolbox works in this example as well: if we aim at placing the poles 
at the location of the Bernoulli stabilized ones, then the corresponding feedback 
gain matrix has norm «204. The computed closed-loop poles are slightly less 
accurate than in the Bernoulli approach, though, with relative errors in the com- 
puted poles roughly four times larger. 

The open- and closed-loop poles are displayed in Fig. 3.1. As usual for the Bass 
algorithm, the stabilized poles are placed on a vertical line in the left half plane 
while the stable open-loop poles are preserved. As expected from Proposition 3, 
the Bernoulli approach reflects the unstable poles with respect to the imaginary 
axis. 

Example 2 Consider the instationary Stokes equation describing the flow of an 
incompressible fluid at low Reynolds numbers: 



8v 


(^,r)eQx(0,f/), 


= div V, 


{i,t)eQx(0,tf) 



(3.11) 



with appropriate initial and boundary conditions. Here, v{^, t) € M^ is the velocity 
vector, p{c,,t) e M is the pressure, f{c,,t) G M^ is a vector of external forces, 
Q C K^ is a bounded open domain and f/ > is the endpoint of the considered 
time interval. The spatial discretization of the Stokes equation (3.11) by a finite 
volume method on a uniform staggered grid leads to a descriptor system, where the 
matrix coefficients are sparse and have a special block structure given by 



E = 



and 



A = 



All 
AT 



Ai2 





see [40] and references therein. Here, An e 



corresponds to the discretized 



Laplace operator while A 12 and Ajj represent discretizations of the gradient and 
divergence operators in (3.11). 
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Fig. 3.1 Example 1. Open-loop and closed-loop poles computed using generalized Lyapunov 
and Bernoulli equations 
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The input matrix B can be obtained from several sources. Assuming volume 
forces, we can write /(c, t) = b{c,)u{t) and the matrix B is obtained from the 
discretization of b{c,). Another possibility is boundary control. 

The descriptor system obtained in this way is stable and of index 2 [40]. We 
de-stabilize the system by adding a/„^ to An so that the index is preserved. Such a 
term arises, e.g., when the volume forces are also proportional to the velocity field, 
/((^, t) = av((;, t) + b{^)u{t) (where we do not claim physical relevance of this 
situation — a is then obtained from a by scaling related to the mesh size). 

In all of the following computations, we consider a coarse 16 x 16 grid, 
resulting in n^. = 480 velocity and rip — 255 pressure variables so that n ~ 735. 
This results in «oo = 510 infinite poles and 225 finite ones. 

In the first setting, we mimic a situation where the volume forces are given by 
the superposition of two different sources. Thus, m = 2, where we choose the two 
columns of Z? at random. Choosing a = 100, 3 poles become unstable. Altogether, 
the stabilization problem turns out to be ill-conditioned enough to make gplace 
fail to stabilize the descriptor system as well as the system projected on the 
subspaces corresponding to the finite eigenvalues (resulting from Step 1 of 
Algorithm 4) with all pole assignments we tried (random assignment, equally 
distributed from —1 to — «/). Note that gplace returns with an error message 
when applied to the full descriptor system while for the projected system, non- 
stabilizing feedbacks are returned without a warning. The Descriptor Toolbox 
function gstab stabilized the projected system with ||F||2 ~ 85.5. Both our 
approaches based on the generalized Bass algorithm and on the algebraic Bernoulli 
equation succeed in stabilizing the system, where ||fBass|l2 ~ 170 and 
||^abe||2 ~ 233. It appears that here, the stabilization of the fully unstable, small 
system requires more effort than the stabilization of the larger projected system. 
Open- and closed-loop poles for both approaches are shown in Fig. 3.2. 

From the close-up (bottom row in Fig. 3.2) we can again see that the Bernoulli 
approach reflects the unstable poles with respect to the origin. It should be noted, 
though, that some of the computed closed-loop poles do not possess the desirable 
property to come out as real eigenvalues (all finite poles are real in this example), 
but the imaginary parts (which are zero in exact arithmetic) are rather small (of 
order IQ-'^). 

In our second set of tests we use an input matrix related to boundary control 
similar to the version used in [39], i.e., we have m — 64. To make the stabilization 
problem more difficult, we set a — 1, 000 for the de-stabilization, resulting in 105 
unstable poles. Neither gplace (with assignment of random poles, mirrored 
unstable poles, or {—i, —i+ 1, . . ., — 1} where t = rii + n2 ot £ = 112) nor gstab 
were able to stabilize any of the systems: both fail for the full descriptor system 
with error messages, they compute feedback matrices for the generalized state- 
space systems resulting from projecting onto the deflating subspaces correspond- 
ing to finite eigenvalues and unstable poles, respectively. The best result 
using gplace was obtained when applied to the projected fully unstable system 
with poles assigned to { — 105, — 104, . . ., — 1}. In this case, only 14 computed 
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Fig. 3.2 Example 2 with B related to volume force (m = 2). Open-loop and closed-loop poles 
computed using generalized Lyapunov and Bernoulli equations {top row) with close-up around 
the origin (bottom row) 



closed-loop poles remained unstable. But it should be observed that none of the 
computed closed-loop poles is close to the assigned ones! In all other attempts and 
also for gstab, the number of unstable computed closed-loop poles was much 
larger. 

On the other hand, both versions of our partial stabilization algorithms based on 
the generalized Bass algorithm and the algebraic Bernoulli equation were able to 
stabilize this system. A second stabilization step as described in Remark 1 was 
necessary, though. For the generalized Bass algorithm with P = I, eight closed- 
loop poles remain unstable after the first stabilization step while only two unstable 
poles had to be treated in the second stabilization step when using the Bernoulli 
approach. The resulting gain matrices show that a lot of effort is needed to stabilize 
the system: ll^BassL ~ 1-4 • 10** and H/^abeL ~ 4.2 • 10^ The plotted pole dis- 
tributions shown in Fig. 3.3 demonstrate that the slightly higher effort of the 
Bernoulli approach is worthwhile as there are no highly undamped closed-poles 
(i.e., poles with relatively large imaginary parts compared to their real parts). The 
close-ups in the bottom row of this figure also show again that unstable poles are 
reflected with respect to the imaginary axis in the Bernoulli approach. It must be 
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Fig. 3.3 Example 2 with B related to boundary control (m = 64). Open-loop and closed-loop 
poles computed using generalized Lyapunov and Bernoulli equations (top row) with close-up 
around the origin (bottom row) 



noticed, though, that all closed-loop poles in the Bernoulli approach should the- 
oretically be real which is obviously not true for the computed ones. This is due to 
the fact that the closed-loop matrix pencil represents a highly non-normal eigen- 
value problem and small perturbations may lead to large deviations in the 
eigenvalues. Fortunately, the closed-loop poles with nonzero imaginary parts are 
far enough from the imaginary axis so that in practice, the closed-loop pole dis- 
tribution computed using the Bernoulli approach can be considered reasonable. 



3.6 Conclusions and Outlook 



The partial stabilization methods suggested in this paper use the disk function 
method to first separate finite and infinite, and then the disk or sign function 
method to separate stable and unstable poles of a stabilizable descriptor system. 
The stable poles are preserved while the unstable ones are stabilized by state 
feedback. The feedback gain matrix is computed using either the generalized Bass 
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algorithm as described in [8] (similar to [43]) or an approach based on the alge- 
braic Bernoulli equation. In the latter case, the stabilized poles are the mirror 
images of the unstable open-loop poles. Due to the ill-conditioning of the stabil- 
ization problem, closed-loop poles may not be stable in contrast to expectation 
raised by the theory. Often, applying the partial stabilization algorithm again to the 
closed-loop system resolves this problem. The numerical examples demonstrate 
that our algorithm can solve stabilization problems so far not treatable with 
available software. 

The disk function based stabilization method can be applied to fairly large 
systems with up to several thousand state-space variables as it can be implemented 
very efficiently on modern computer architectures. A parallel implementation of 
the algorithm based on the matrix disk function and partial stabilization methods 
for standard systems in generalized state-space form [11, 12] as implemented in 
the Parallel Library in Control" (PLiC, see [14]) is straightforward and is planned 
for the future. 
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Chapter 4 

Comparing Two Matrices by Means 

of Isometric Projections 

T. P. Cason, P.-A. Absil and P. Van Dooren 



Abstract In this paper, we go over a number of optimization problems defined on 
a manifold in order to compare two matrices, possibly of different order. We 
consider several variants and show how these problems relate to various specific 
problems from the literature. 



4.1 Introduction 

When comparing two matrices A and B it is often natural to allow for a class of 
transformations acting on these matrices. For instance, when comparing adjacency 
matrices A and B of two graphs with an equal number of nodes, one can allow 
symmetric permutations P^AP on one matrix in order to compare it to B, since this 
is merely a relabelling of the nodes of A. The so-called comparison then consists in 
finding the best match between A and B under this class of transformations. 

A more general class of transformations would be that of unitary similarity 
transformations Q*AQ, where Q is a unitary matrix. This leaves the eigenvalues of 
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A unchanged but rotates its eigenvectors, wiiich will of course play a role in the 
comparison between A and B. If A and B are of different order, say m and n, one 
may want to consider their restriction on a lower dimensional subspace: 

U*AU and V*BV, (4.1) 

with U and V belonging to St(fe, m) and St(A:, n) respectively, and where St{k, m) — 
{U & C"^ : U*U ~ Ik] denotes the compact Stiefel manifold. This yields two 
square matrices of equal dimension k < min(m, n), which can again be compared. 
But one still needs to define a measure of comparison between these restrictions 
of A and B which clearly depends on U and V. Fraikin et al. [1] propose in this 
context to maximize the inner product between the isometric projections, U*AU 
and V*BV, namely: 

sirgmeLx{U*AU,V*BV) := 3?tr((t/*A[/)*(y*By)), 



where K denotes the real part of a complex number. They show this is also 
equivalent to 

ais max (XA,BX) = ?Rtr(A*X*BX), 

x=vu' 

V'V=lt 

and eventually show how this problem is linked to the notion of graph similarity 
introduced by Blondel et al. [2]. The graph similarity matrix S introduced in that 
paper also proposes a way of comparing two matrices A and B via the fixed point 
of a particular iteration. But it is shown in [3] that this is equivalent to the 
optimization problem 

arg max {SA,BS) = 3?tr((&4)*S5) 

or also 

arg max (S,BSA*) = ^tr((SA)*BS). 

Notice that S also belongs to a Stiefel manifold, since vec(5') G St(l,mn). 

In this paper, we use a distance measure rather than an inner product to compare 
two matrices. As squared distance measure between two matrices M and N, we 
will use 

dist^(M,A') = \\M-N\\l = tr{{M-N)*{M~N)). 

We will analyze distance minimization problems that are essentially the coun- 
terparts of the similarity measures defined above. These are 

arg min dist^( U*AU, V*BV), 
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are min dist-(X4,BZ), 

X=VU' 

and 

arg min dist^(Z,BX4*), 

X=VU' 

VK=;,. 

for the problems involving two isometries U and V. Notice that these three dis- 
tance problems are not equivalent although the corresponding inner product 
problems are equivalent. 

Similarly, we will analyze the two problems 

arg min dist^f^A,^^) = ii{{SA - BSY [SA - BS)) 

l|S||f=l 

and 

arg min dist^ [S, BSA*) ^Vr{{S - USA*)* {S - BSA*)), 

\\^\\f~^ 

for the problems involving a single matrix S. Again, these are not equivalent in 
their distance formulation although the corresponding inner product problems are 
equivalent. 

We will develop optimality conditions for those matrix comparison problems, 
indicate their relations with existing problems from the literature and give an 
analytic solution for particular matrices A and B. 



4.2 Preliminaries on Riemannian Optimization 

All those problems are defined on feasible sets that have a manifold structure. 
Roughly speaking, this means that the feasible set is locally smoothly identified 
with M'', where d is the dimension of the manifold. Optimization on a manifold 
generalizes optimization in M'' while retaining the concept of smoothness. In this 
section, we recall the essential background on optimization on manifolds, and refer 
the reader to [4] for details. 

A well known and largely used class of manifolds is the class of embedded 
submanifolds. The submersion theorem gives a useful sufficient condition to prove 
that a subset of a manifold AA is an embedded submanifold of A^. If there exists a 
smooth mapping F : A4 -^ Af between two manifolds of dimension d,„ and 
d[j{<d„,) and y ^ J\f such that the rank of F is equal to d'^ at each point of 
M := F^^{y), then Af isa embedded submanifold of A4 and the dimension ofAf is 
d,„ ~ d' 
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V = -F(AA) 



Af' 



Example The unitary group U(m) = {2 G C"^" : Q*Q = /„} is an embedded 
submanifold of C"^". Indeed, consider the function 

where Sua{n) denotes the set of Hermitian matrices of order n. Clearly, 
U(«) = F'^(0„). It remains to show for all H e 5Her(A:), there exists an H e C"""" 
such that DF{Q)H = Q*H + H*Q = H. It is easy to see that DF{Q)- 
{QH 12) = H, and according to the submersion theorem, it follows that U(n) is an 
embedded submanifold of C"^". The dimension of C"^" and Sueiin) are 2n^ and 
n^ respectively. Hence U(«) is of dimension m^. 

In our problems, embedding spaces are matrix-Euclidean spaces C'"^ x C"^ 
and C""" which have a trivial manifold structure since C"""* x C"""*^ ~ M^™*' and 
f^nxm ^ ]g^2mn p^j. g^^j^ problem, we further analyze whether or not the feasible set 
is an embedded submanifold of their embedding space. 

When working with a function on a manifold Ai, one may be interested in 
having a local linear approximation of that function. Let M be an element of A4 
and 5^(7W) denote the set of smooth real-valued functions defined on a neigh- 
borhood of M. 

Definition 1.1 A tangent vector ^^ to a manifold A^ at a point M is a mapping 
from (5jvf (•^) to M such that there exists a curve y on Ai with y(0) ~ M, satisfying 



.w^ *"'<"' 



dt 



, V/eg^(X). 

t=0 



Such a curve y is said to realize the tangent vector ^^. 

So, the only thing we need to know about a curve y in order to compute the 
first-order variation of a real-value function / at y(0) along y is the tangent 
vector ^x realized by y. The tangent space to A4 at M, denoted by T^AA, is the 
set of all tangent vectors to A^ at M and it admits a structure of vector space 
over M. When considering an embedded submanifold in a Euclidean space £, 
any tangent vector ^^ of the manifold is equivalent to a vector E of the 
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Euclidean space. Indeed, let/ be any a differentiable continuous extension of/ 
on E, we have 



. , d/().(0) 



df 



D/(M) • £, (4.2) 



(=0 

where E is j'(0) and D is the directional derivative operator 

-. , f{M + tE)-f(M) 

Df(M) ■ E = lira-^ — - — '- — ^-^^. 

The tangent space reduces to a linear subspace of the original space £. 

Example Let y{t) be a curve on the unitary group U{n) passing through Q at f = 0, 
i.e. y{t)*y{t) = I„ and y{Q) — Q. Differentiating with respect to t yields 

7(o)*e + e*Ko) = o„. 

One can see from Eq. 4.2 that the tangent space to U(«) at Q is contained in 

{E e C'""' : E*Q + Q*E = 0,,} = {QQ. e C"""" : Q* + Q = 0„}. (4.3) 

Moreover, this set is a vector space over M of dimension rp-, and hence is the 
tangent space itself. 

Let gM be an inner product defined on the tangent plane TmM.- The gradient of 
/ at M, denoted grad/(M), is defined as the unique element of the tangent plane 
TmM., that satisfies 

iuf = gM (grad/(M) ,^m), ^^m ^ TmM . 

The gradient, together with the inner product, fully characterizes the local first 
order approximation of a smooth function defined on the manifold. In the case of 
an embedded manifold of a Euclidean space £, since TmM is a linear subspace of 
Tm£, an inner product gM on Tm£ generates by restriction an inner product gM on 
TmM. The orthogonal complement of TmM with respect to gM is called the 
normal space to A^ at M and denoted by (TmM) . The gradient of a smooth 
function /, defined on the embedding manifold may be decomposed into its 
orthogonal projection on the tangent and normal space, respectively 

PMgrad/(M) and PJ,gmdf{M), 

and it follows that the gradient of/ (the restriction of/ on M) is the projection on 
the tangent space of the gradient of/ 

grad/(M) = PMgrad/(M). 

If TmM is endowed with an inner product gM for all M G A^ and gM varies 
smoothly with M, the M is termed a Riemannian manifold. 
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Example Let A and B, two Hermitian matrices. We define 

/ : C"""" ^ M : e h-> mv{Q*AQB), 

and/ its restriction on the unitary group U(«). We have 

D/(e) ■E^2^tr{E*AQB). 

We endow the tangent space TqC"^" with an inner product 

g : TqC""" X Tq€""" -^R:E,F^ ^tv{E*F), 

and the gradient of/ at Q is then given by grad/(Q) = 2AQB. One can further 
define an orthogonal projection on TQl]{n) 

PQE:=E~QneriQ*E), 

and the gradient of/ at Q is given by grad/(2) — PQgrad/(2). 

Those relations are useful when one wishes to analyze optimization problems, 
and will hence be further developed for the problems we are interested in. 

4.3 The Matrix Comparison Problems and Their Geometry 

Below we look at the various problems introduced earlier and focus on the first 
problem to make these ideas more explicit. 

Problem 1 Given A e C"""" and B e C""", let 

/ : C"'" X C"^*^ ^C:{U,V)^ f{U, V) = dist{U*AU, V*BV), 

find the minimizer of 

f:St{k,m) xSt{k,n) -> C : {U,V) ^f{U,V) =f{U,V), 

where 

St{k, m) = {U e C"'"" : U*U = h} 

denotes the compact Stiefel manifold. 

Let A = (Ai,A2) and B = (Bi,B2) be pairs of matrices. We define the fol- 
lowing useful operations: 

• an entrywise product, Ao B = {AiBi,A2B2), 

• a contraction product, A-k B — AiBi + A2B2, and 

• a conjugate-transpose operation, A* — (AjjAj). 
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The definitions of the binary operations, o and *, are (for readability) 
extended to single matrices when one has to deal with pairs of identical matrices. 
Let, for instance, A = {A[,A2) be a pair of matrices and 5 be a single matrix, we 
define 

AoB= {AuA2)oB = (Ai,A2) o (B,B) = {AiB,A2B) 
A'kB= (Ai,A2)*B= (Ai,A2)*(B,5) ^AiB+AiB. 

The feasible set of Problem 1 is given by the cartesian product of two 
compact Stiefel manifolds, namely Ai ~ St(fe, m) x St{k, n) and is hence a 
manifold itself (cf. [4]). Moreover, we can prove that M is an embedded 
submanifold of 

£ := C'"''* X C"''*. 

Indeed, consider the function 

F:£^ SH.r{k) X SH.r{k) ■.M^M*oM-{h,h) 

where Sua-{k) denotes the set of Hermitian matrices of order k. Clearly, 
A4 = F"'(0/t, Ot). It remains to show that each point M e A^ is a regular value of 
F which means that F has full rank, i.e. for all Z G iSHer(^) x >5Her(^), there exists 
Z e£ such that DF{M) • Z = Z. It is easy to see that DF{M) ■ {M o Z/2) = Z, and 
according to the submersion theorem, it follows that AA is an embedded sub- 
manifold of £. 

The tangent space to f at a point M = ([/, V) e £^ is the embedding space itself 
(i.e. Tm£ — £), whereas the tangent space to A^ at a point M = {U, V) e A^ is 
given by 

TmM := {y{0) : y, differentiable curve on M with 7(0) = M} 
= {C = (Cf/,^v):Her(r*M)=0} 

where M± = {U±,V±) with U± and V± any orthogonal complement of respec- 
tively U and V, where Her(-) stands for 

Her(-) :Xt-> {X + X*)/2, 

and where Ss-Her{k) denotes the set of skew-Hermitian matrices of order k. We 
endow the tangent space Tm£ with an inner product: 

gM{; •) : Tm£ X Tm^ -> C : C, C ^ mii, = 5Rtr(r • Q, 

and define its restriction on the tangent space TmM(c Tm£) ■ 

gM{; ■) ■■ TmM x TmM ^ C : ^, C ^ gmi^, = 8m{^, 0- 
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One may now define the normal space to A^ at a point M e A^: 

T^M : = {^ gMii, - 0, VC e TmM} 

= {Mo{Hu,Hv) : Hu,Hv G ^Her^}, 

where iSHer(^) denotes the set of Hermitian matrices of order k. 
Problem 2 Given A e C"'""" and B e C"""", let 

/ : C"""" -> C : Z ^f{X) = dist(X4,BZ) 
find the minimizer of 

where X = {VC/* e C'""" : (f/, V) G St(yfe,m) x St(yt,«)}. 



7W is a smooth and connected manifold. Indeed, let Z := 



be an element 



"4 0" 


of ^A. Every X e A^ is congruent to S by the congruence action {{U,V),X) i— > 
V*XU, {U, V) e U(m) X U(«), where U(m) = {t/ G C'"'" : C/*C/ = /„} denotes the 
unitary group of degree n. The set A4 is an orbit of this smooth complex algebraic Lie 
group action of U(m) x U(«) on C'"^"' and therefore a smooth manifold [5, App. C]. 
A4 is the image of the connected subset U(m) x U(n) of the continuous (and in fact 
smooth) map 7t : U(m) x U(m) ^ €"""', n{U, V) = V*XU, and hence is also 
connected. 

The tangent space to A^ at a point X = VU* G A^ is 

TmM := {y{0) : y curve on M with 7(0) = X} 

= {C,.t/* + VCu ■■ Her(y*.^^,) = Her{U*iu) = 0,} 
= {vnu* + VKljUl + V^KvU* : Q G 5j_Hei-('t)}- 

We endow the tangent space Ta-C'""' ~ C"""" with an inner product: 

gxi; •) : TxC"""' x TxC"^'" ^ C : C, C ^ gxi^, - »tr(rC), 

and define its restriction on the tangent space 7xA^(c Tx£)'- 

gx{; •) : TxX X TVX K^ C : ^, C ^ gx{i, = ^z(^, 0- 

One may now define the normal space to A/( at a point X ^ Ai: 

T^M:={i:gx{i,0^0,'^CeTxM} 

= {VHU* + V^KUl : H e Su.rik)}. 

Problem 3 Given A e C"'""" and B e C""", let 

/ : C"""" -^ C : X i^ /(Z) = dist(Z, BZ4*) 
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find the minimizer of 

f:M^C:X^f{X)^f{X), 
where M = {VU* e C"^™ : (t/, V) £ St(k,m) x St{k,n)}. 

Since they have the same feasible set, developments obtained for Problem 2 
hold also for Problem 3. 

Problem 4 Given A e C"""" and B e C""", let 

/ : C"^" -> C : 5 ^ f{S) ^ dist{SA,BS) 

find the minimizer of 

where X = {5 e C""'" : ||5||;. = l}. 

The tangent space to A^ at a point S G A4 is 

We endow the tangent space TsC"""" ~ C"^'" with an inner product: 

gsi; ■) : rsC"^'« X rsC"^'« - C : ^, C - gs(^, = iRtr(rC), 
and define its restriction on the tangent space Ts A4(c TsE): 

gs{; ■) ■.TsMxTsM^C:i,i:-^ gs{^, Q = gsH, 0- 
One may now define the normal space to A^ at a point S £ A4: 

T^M := {i : gs{^, Q = 0, VC G TsM} ^{aS:aeR} 

Problem 5 Given A e C"""" and 5 G C"""", let 

/ . c«xm ^ ^ . 5 ^/(5) ^ dist(5,B5A*) 

find the minimizer of 

f:M^C:X^fiX)=fiX), 

where X = {5 e C""™ : ||5||;. = l}. 

Since they have the same feasible set, developments obtained for Problem 4 
also hold for Problem 5. 

4.4 Optimality Conditions 

Our problems are optimization problems of smooth functions defined on a compact 
domain Ai, and therefore there always exists an optimal solution M ^ M. where 
the first order optimality condition is satisfied, 
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grad/(M) = 0. (4.4) 

We study the stationary points of Problem 1 in detail, and we show how the other 
problems can be tackled. 

Problem 1 We first analyze this optimality condition for Problem 1. For any 

(W,Z) G Tm£, we have 

^ f iW*AU +U*AW ~Z*BV ~V*BZ)* 

J\ ^ J \ ^ ) V {U*AU-V*BV) 

with Aab '■— U*AU — V*BV =: — A^i, and hence the gradient of / at a point 
{U,V) eSis 

grad/(C/,y)=2(^^^^^^^^,^^^J. (4.6) 

Since the normal space T^M is the orthogonal complement of the tangent space 
TmM., one can, for any M ^ M., decompose any E ^ £ into its orthogonal pro- 
jections on TmM. and T^A4: 

PmE := £■ - PJ^E and P^E := M o Her(M* o E). (4.7) 

The gradient of/ at a point ([/, y) G A^ is 

grad/((/, V) = PMgrad/(M). (4.8) 

For our problem, the first order optimality condition (4.4) yields, by means of 
(4.6), (4.7) and (4.8) 

fAUAX,+A*UAAs\ _fu\ ^^JWAUAl^ + WA^UA^b 



V BVAl^ + B* VAba J V ^ / V V*BVAl^ + V*B* VAba J' ^ ' 

Observe that / is constant on the equivalence classes 

[[U,V)] = {{U,V)oQ:Q^\][k)}, 
and that any point of [(t/, V)] is a stationary point of/ whenever ({/, V) is. 

We consider the special case where U*AU and V*BV are simultaneously 

diagonalizable by a unitary matrix at all stationary points {U,V), i.e. eigende- 

composition of U*AU and V*BV are respectively WD^W* and WDbW*, with 

W e Uik) and Da = diag(0f , . . ., 0^), Db = diag(0f , . . ., 0f). This happens when 

f U*\ 
A and B are both Hermitian. Indeed, in that case, applying , ,* I * on the lest of 

(4.9) yields 
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U*AU V*BV = V*BV U*AU, 

which implies that U*AU and V*BV have the same eigenvectors. The cost function 

at stationary points simply reduces to X]/=i|^~^f| ^^"^ 'he minimization 
problem roughly consists in finding the isometric projections U*AU, V*BV such 
that their eigenvalues are pairwise as near as possible. 
More precisely, the first optimality condition becomes 



oW 



oWo 



Da 
Db 



Da 
Db 



Db 
Da 



= 0, 



(4.10) 



that is. 



Of 
Of 



of 
of 



0, i=l,...,k (4.11) 



where [/, and V, denotes the rth column of U — UW and V = VW respectively. 
This impfies that for all ( = 1, . . ., ;t either Of ^ Of ot {Of , tj) and (Of , V) are 
eigenpairs of respectively A and B. 

Definition 1.2 Let A and Au be two square matrices respectively of order m and k, 
where m > k. An is said to be imbeddable in A if there exists a matrix U € St(^, m) 
such that U*AU = Au. 

For Hermitian matrices, [6] gives us the following result. 

Tlieorem 1.3 Let A and Au be Hermitian matrices and oi.i <a2< ■ ■ ■ < a,„ and 
of <0f < ■ ■ ■ <0f their respective eigenvalues. Then a necessary and sufficient 
condition for Au to be imbeddable in A is that 

of e [«;, a/-/t+m], ( = 1, . . ., k. 

The necessity part of this theorem is well known as the Cauchy interlacing 
theorem [7, p. 202]. The sufficiency part is less easy to prove cf. [6, 8]. 

Definition 1.4 For Si and S2, two non-empty subsets of a metric space together 
with the distance d, we define 

ed{Sx,S2) = mfd{si,S2) 

■'I e^l 

When considering two non-empty subsets [«; , a2] and [/?; , /?2] of M, one can easily 
see that 

edi\<y-\-,<y-2\-,[P\i^2]) =max(0,ai - ^2,P\ - ^a)- 



Tlieorem 1.5 Let ai < 0:2 < • • • < a,„ and ^1 < /?2 < ■ ■ • < A, be the eigenvalues 
of Hermitian matrices respectively A and B. The solution of Problem 1 is 
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^(ej([a;, li^k+m], [Pi, Pi-k+,^)y 



1=1 

with d the Euclidean norm. 

Proof Recall that when A and B are Hermitian matrices, U*AU and V*BV are 
jointly diagonalizable by a unitary matrix at all stationary points {U,V). Since the 
distance is invariant by joint unitary transformation, the value of the minimum is 
not affected if we restrict {U, V) to be such that VAU and V*BV are diagonal. 
Problem 1 reduces to minimize 

/(t/,y) = ^|of-off, 

!=I 

where t/*At/ = diag(0f , . . ., 6)^) and y*By = diag(6»f , . . ., 0f ). It follows from 
Theorem 1.3 that the minimum of Problem 1 is 

r^-i'^EKo-«f)' (4-12) 



/=i 



such that 



^ < ^ < ■ ■ ■ < (^, ef < Of < ■ ■ • < 0f , (4.13) 

0f €[«,■,«,■-,+„], 0f€ [/?,■,/?,■_,+„], (4.14) 

and 7c(-) is a permutation of 1, . . .,k. 

Let 9^,...,0^, and Of,..., Of satisfy (4.13). Then, the identity permutation 
n{i) — i is optimal for problem (4.12). Indeed, if n is not the identity, then there 
exists i andy such that i <j and n(i) > n(j), and we have 






2 



= 2(0f-0/^)(0!,,-0^^.)j<0. 

Since the identity permutation is optimal, our minimization problem simply 
reduces to 

V min (Of -Of)'. 

We now show that (4.13) can be relaxed. Indeed, assume there is an optimal 
solution that does not satisfy the ordering condition, i.e. there exist i and j, ; <j 
such that 6^ <6f. One can see that the following inequalities hold 
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0(-i < y-j < Of < Of < a,_i+„, < OCj^k+m- 

Since Of belongs to [a,-, a/-t+m] and 6y belongs to [a,-, y.i-k+m], one can switch / and 
j and build an ordered solution that does not change the cost function and hence 
remains optimal. 

It follows that ^,-^[ min(4.i3)(4 14) (0f — Of) isequalto^^^i min(4.i4) (Of — Of) . 
This result is precisely what we were looking for. D 

Figure 4. 1 gives an example of optimal matching. 

Problem 2 For all Y e T^C"'"" ~ €"""", we have 

D/(Z) • F = 25Rtr(r {XAA* - B*XA - BXA* + B*BX)), (4.15) 

and hence the gradient of/ at a point X e C"^"' is 

grad/(X) = 2{XAA* - B*XA - BXA* + B*BX). (4.16) 

Since the normal space T^M is the orthogonal complement of the tangent space 
TxM, one can, for any X = VU* £ M, decompose any E e C'"^™ into its 
orthogonal projections on TxA4 and T^Ai: 



PxE = E- VUer{V*EU)U* - (/„ - W*)E{I„ - UU*), and 
P;^^ = VUsi{V*EU)U* + (/„ - W*)E{I^ -UU*). 

For any Y G TxA4, (4.15) hence yields 

Df{X) ■ Y - Df{X) ■ Y - gx(Pxgrad/(X), Y), 

and the gradient of / at a point X = VU* S A^ is 

grad/(X)=Pxgrad/(X). 



(4.17) 



(4.18) 



O) 0203 04 OS 



ag 



W' 0IT 



'1 19. 




H f 1— 



2 £».' 



:<^_^ 



^1 MA A /?e 



/J7 



Fig. 4.1 Let tlie a; and /!, be tlie eigenvalues of tlie Hermitian matrices A and B, and k = 3. 
Problem 1 is then equivalent to '}2i=i min(^4 „b (Sf — 6, )~ such that 07 G [a,, a,+3], 6; G [/?,, /J,+4] . 
The two first terms of this sum have strictly positive contributions whereas the third one can be 
reduced to zero within a continuous set of values for ff^ and 9| in [0(3, /Jy] 
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Problem 3 This problem is very similar to Problem 2. We have 

Df{X) ■ Y = 2Ktr(r(X - B*XA - BXA* + B*BXA*A)), (4.19) 

for all Y G TxC"''"' ~ C"'''", and hence the gradient of/ at a point X G C"'''" is 
grad/(X) = 2{X - B*XA - BXA* + B*BXA*A). 

The feasible set is the same as in Problem 2. Hence the orthogonal decomposition 
(4.17) holds, and the gradient of / at a point X = VU* e M is 

grad/(X)=Pxgrad/(X). 

Problem 4 For all T e TsC"""" ^ C"'"", we have 

D/(5) • T = 2mr{T*{SAA* - B*SA - BSA* + B*BS)), (4.20) 

and hence the gradient of / at a point S G C"*"" is 

grad/(S) = 2{SAA* ~ B*SA - BSA* + B*BS). (4.21) 

Since the normal space, T^Ai, is the orthogonal complement of the tangent space, 
TsAi, one can, for any S e A4, decompose any E £ £"^'" into its orthogonal 
projections on TsJ^ and T^Ai: 

PsE = E-S?fitr{S*E) and PJE = S^tr{S*E). (4.22) 

For any T e TsM, (4.20) then yields 

D/(5) • T = Df{S) ■ T = g5(Psgrad/(5), r) , 

and the gradient of / at a point S £ M is grad/(5') = Psgrad/(5'). 
For our problem, (4.4) yields, by means of (4.21) and (4.22) 

aS = {SA - BS)A* - B*{SA - BS) 
where 1 = tr((&4 — BSy{SA — BS)) =f{S). Its equivalent vectorized form is 

lvec(5) = {A^ (g) I - I (g) B)* {A^ (g) I - I -g) B)vec(S). 

Hence, the stationary points of Problem 4 are given by the eigenvectors of 
(A^ (8)/-/(8)B)*(A^ /-/(8)B). The cost function/ simply reduces to the 
corresponding eigenvalue and the minimal cost is then the smallest eigenvalue. 

Problem 5 This problem is very similar to Problem 4. A similar approach yields 

15 = (5 - BSA*) -B*{S- BSA*)A 
where 1 = tr((5' — BSA*)*{S — BSA*)). Its equivalent vectorized form is 
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;.vec(5') = (/®/-A®B)*(/®/-A(g)B)vec(S), 

where A denotes the complex conjugate of A. 

Hence, the stationary points of Problem 5 are given by the eigenvectors of 
(/ (X> / — A (g) 5) (I (g) I — A(E) B), and the cost function/ again simply reduces to 
the corresponding eigenvalue and the minimal cost is then the smallest eigenvalue. 



4.5 Iterative Methods 

The solutions of the problems mentioned above may not have a closed form 
expression or may be very expensive to compute. Iterative optimization methods 
build a sequence of iterates that hopefully converges as fast as possible towards the 
optimal solution and eventually gives an estimate of it. 

The complexity of classical algorithms can significantly increase according to 
the number of variables to deal with. An interesting approach when one works on a 
feasible set that has a manifold structure amounts to use classical methods defined 
on Euclidean spaces and apply them to the tangent space to the manifold (see [4]). 
The link between the tangent space and the manifold is naturally done using a 
so called retraction mapping. Let M be a point of a manifold A^, the retraction 
Rm is a smooth function that maps the tangent vector Cm G TmA4 to a point on 
M such that 



Om (the zero element of TmM) is mapped onto M, and 
there is no distortion around the origin, which means that 



DRm{Om) = idr^Ai 

where idj^M denotes the identity mapping on TmM.- 

Given a cost function/ defined on a manifold M, one can build a pullback cost 
function /m =/o^m on the vector space TmM- One can now easily generalize 
methods defined on Euclidean space. A well known class of iterative methods are 
line-search methods. In W^, choose a starting point xq and proceed through the 
following iteration 

Xk+\ =Xk + tkZk, 

where Zk is a suitable search direction and f^ a scalar called the step size. This 
iteration can be generalized as follows on manifolds: 

Mk+\ = RMtih^k)-: 

where (^^ is a tangent vector. For minimization problems, one may choose the 
opposite of the gradient as search direction 

^k = -gmdf{Mk). 
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This particular case is known as the steepest descent method. The step size can 
further be set using the so-called Armijo Back-Tracking scheme, for example. 

Some iterative methods, like Newton's method, use higher-order derivatives of/. 
We further plan to investigate several of these methods in order to solve the 
problems mentioned above. 



4.6 Relation to the Crawford Number 

The field of values of a square matrix A is defined as the set of complex numbers 

T{A) := {x*Ax : x*x =1}, 

and is known to be a closed convex set [9]. The Crawford number is defined as the 
distance from that compact set to the origin 

Cr{A) :=min{|l| : A e J^(A)}, 

and can be computed e.g. with techniques described in [9]. One could define the 
generalized Crawford number of two matrices A and B as the distance between 
T{A) and T{B), i.e. 

Cr{A,B) := min{|i - ^(| : i e T{A),i.i e JP(B)}. 

Clearly, Cr{A,0) ~ Cr{A) which thus generalizes the concept. Moreover this is a 
special case of our problem since 

Cr(A,B)= min \\U*AU ~ V*BV\\. 

U'U=VV=\ 

One can say that Problem 1 is a A:-dimensional extension of this problem. 

Acknowledgments We thank the reviewers for their useful remarks and for mentioning refer- 
ence [6]. 
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Chapter 5 

A Framelet-Based Algorithm for Video 

Enhancement 

Raymond H. Chan, Yiqiu Dong and Zexi Wang 



Abstract Video clips are made up of many still frames. Most of the times, the 
frames are small perturbations of their neighboring frames. Recently, we proposed 
a framelet-based algorithm to enhance the resolution of any frames in a video clip 
by solving it as a super-resolution image reconstruction problem. In this paper, we 
extend the algorithm to video enhancement, where we compose a high-resolution 
video from a low-resolution one. An experimental result of our algorithm on a real 
video clip is given to illustrate the performance. 



5.1 Introduction 

High-resolution images are useful in remote sensing, surveillance, military 
imaging, and medical imaging, see for instances [4, 12, 14]. However, high- 
resolution images are more expensive to obtain compared to low-resolution ones 
which can be obtained from an array of inexpensive low-resolution sensors. 
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Therefore, there has been much interest in reconstructing high-resolution images 
from low-resolution ones that are small perturbation of each other, see [9, 11, 13, 
15, 17, 20]. One approach is based on the maximum likelihood technique using the 
expectation maximization algorithm to seek the high resolution image [3]. Another 
approach is the regularization method in [10] which was based on the total squared 
error between the observed low-resolution images and the predicted low-resolution 
images. The predicted images are the results of projecting the high-resolution 
image estimate through the observation model. The framelet algorithm proposed 
in [5, 6] is different from these methods. It applies the unitary extension principle 
in [16] to form a system of tight frame filters. In this approach, there is only 
matrix-vector multiplication, and no need for solving a minimization problem. 
Recently, it was shown that this framelet algorithm, which is an iterative algo- 
rithm, indeed converges to a minimizer of a variational problem [2]. 

Video clips are made up of many still frames (about 25-30 frames per second), 
and the scene usually does not change much from one frame to the next. Thus 
given a reference frame, its nearby frames can be considered as its small pertur- 
bations, and we can make use of them to get a high-resolution image of the 
reference frame. More precisely, consider a sequence of frames {fk}k=-K ^^ ^ 
video clip, where k increases with the time when the frame /^^ is taken. Let/o be the 
reference frame we want to enhance its resolution. The frames {fk}kM can be 
considered as small spatial perturbations of /q. Then, we can use the fram- 
elet algorithm proposed in [5] to improve the resolution of /o. Such a still 
enhancement algorithm is given in [7]. 

The goal of this paper is to extend the algorithm in [7] to video enhancement, 
where high-resolution video streams are constructed from low-resolution ones. 
The paper is organized as follows. In Sect. 5.2, we introduce the framelet algo- 
rithm for the high-resolution image reconstruction given in [5]. In Sect. 5.3, we 
describe the algorithm proposed in [7] for enhancing video stills. Then in Sect. 5.4, 
we extend it to video enhancement and apply the resulting algorithm on a real 
video to enhance the video resolution. Conclusion is given in Sect. 5.5. 

In this paper, we use bold-face characters to indicate vectors. If/ represents an 
image f{x,y), f represents the column vector constructed by raster scanning of/ 
row by row. 



5.2 High-Resolution Image Reconstruction 
5.2.1 The Model 

Here we briefly recall the high-resolution image reconstruction model introduced 
in [1]. For more details, please refer to the paper. Let hhe a piecewise continuous 
function measuring the intensity of a scene. An image of h at sampling resolution 
T can be modeled by the integral 
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(«2+i)r («,+i)r 
/!(«!, «2)=^ / / h{x,y)dxdy, ni,«2eZ. (5-1) 

Here (mi,M2) are the pixel locations. High-resolution image reconstruction refers 
to the construction of an image with sampling resolution T by using K^ low- 
resolution images of sampling resolution KT, where A" is a positive integer. In this 
paper, we only consider K — 2. Larger value of K can be considered similarly, but 
with more complicated notations. 

When K = 2, we are given four low-resolution images, go.o,go.i,gi.o,gi.\ of 
sampling resolution 2T, sampling at 



(2(«'+l)+;)7- (2(«'+l)+i)7- 



?y(n'p«'2) 



1 

472 



h{x,y)dxdy, 



(5.2) 



(2(«'-l)+;)7- (2(«'4)+07- 



where ;, y = 0, 1. The locations (0,0), (0,1), (1,0) and (1,1) are the sensor 
positions. 

A straightforward way to form an image g of sampling resolution T is to 
interlace the four low-resolution images, i.e. 



g(«i,«2) =gij{n\,n'2), 



(5.3) 



where ; — n^ mod 2, j — «2 mod 2, n\ ~ [Mi/2j,n2 — L"2/2J, see Fig. 5.1. The 
function g is called the observed high-resolution image. 

Note that g isnot equal to the desired image h in (5.1) but is a good approxi- 
mation of it. If we assume h{x,y) is constant in the sampling region 

[(«i - 0.5)r, (m + 0.5)r) x [{n2 - 0.5)r, (nj + o.5)r), 

for all n\,n2 <£ Z, (i.e. h{x,y) = /z(«i,«2) there), then by (5.1)-(5.3), we can easily 
prove that 



Fig. 5.1 The model of 

obtaining g by interfacing 

pixels from g,j 

(Open square go,o pixels, 

filled square go.i 
pixels, filled triangle gi^o 
pixels. Open triangle gii 
pixels) 



(3- 



98 



R. H. Chan et al. 



g(ni,M2) =T jh{ni - 1,«2 - 1) + 2^(«i - l,n2) + t^(«i - 1,M2+ 1) 
+ -/z(«i,M2 - 1) + h{ni,n2) +-/!(«!, M2 + 1) 
+ -%i + l,n2- 1) +-/!(«! + l,M2)+T/j(ni + l,n2+ 1) 



In matrix form, it is 



g = //o,oh, 



(5.4) 



where Hqq = Hq(E) Hq with //q being the matrix representation of the discrete 
convolution (i.e. Toeplitz form) with kernel ho — [1/4, 1/2, 1/4]. 

To obtain a better high-resolution image than g, one will have to solve h from 
(5.4). It is an ill-posed problem where many methods are available. One approach 
is the framelet method in [5] that we are going to describe next. 



5.2.2 Framelet-Based HR Image Reconstruction 

Here we briefly recall the algorithm in [5] and refer the reader to the paper for 
more details. The convolution kernel Hq is a low-pass filter. By applying the 
unitary extension principle in [16], Hq together with the following high-pass filters 
form a tight framelet system: 



Al = 



V2 _Vi 

4 ' ' 4 



hi = 



1 1 1 

'4' 2' ~4 



(5.5) 



Define 



Hij=Hi®Hj, 0<!, ;<2, 



where Hi is the discrete convolution matrix with kernel /;,. The perfect recon- 
struction formula for the framelet systems gives 



,j=0 



see [5]. Based on (5.4) and substituting Hooh by g, we have 
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^0.0 ,^ 7,-1 ^^0.0 u, ^ , u , 

" /^o.ow --f^ao^.-i{'f^u,ow). 



■Hq.: 



^2. 






Fig. 5.2 The operator T,, defined recursively with To = /, the identity 



7;.w 



h = ^o,og 



2 



{ijmofi) 



HmA 



(5.6) 



Images are usually contaminated with noise, which are of high-frequency in 
nature. Since except for //qo the others filter matrices are all high-pass, the noise is 
magnified in the second term of (5.6). We can use a threshold operator 2?^ to 
remove the noise. The iteration, in matrix terms, thus becomes 



,(«+!) _ TT* 



Kos 



2 



('■■/■)7^(0,0) 



H*p,{Hi,^^"^), « = 0,1,..., 



where h'"' is the initial guess. Here we use Donoho's soft thresholding operator 
[8]: 



where tx{x) — sgn(x) max(|x| — i, 0), /, = la^/logL, L is the length of the vector 
X, and a is the variance of the noise estimated numerically by the method in [8]. 
However, to avoid too many high-frequency components being removed, we 
use wavelet packets to further decompose the high-frequency components before 
doing the thresholding. In essence, we replace the operator I?^ by the recursively- 
defined T,, shown in Fig. 5.2. The operator T,, will first decompose w until the 
level V and then threshold all the coefficients except the low-frequency ones on 
level V. In matrix terms, we have 



Jn+\) 



<og 



2 

E 

('■j)7^(0.0) 



//*,T,(//,,h(«)) 



0,1,. 



(5.7) 
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5.2.3 HR Reconstruction with Displacement Errors 

Before we can apply the algorithm in Sect. 5.2.2 to improve video quality, there is 
one problem we have to overcome. In enhancing videos, the frames in the video 
may not be aligned exactly by length T as in (5.3) or in Fig. 5.1. For example, 
relative to the reference frame, a nearby frame may have moved in the jc-direction 
by a distance of £T where £ = n + r, with n e Z and |a-| < 1. In that case, if we 
want to apply the algorithm in the last section, we can first shift the frame back by 
nT = {n/2){2T) and then consider the shifted frame as a displaced frame of the 
reference frame with displacement error equals r/2. The displacement error can 
then be corrected by framelet systems as follows. We refer the readers to [5] for 
more details. 

Define the 2D downsampling and upsampling matrices Dij = Dj (g) D, and 
Uij = Uj (X) Ui, where Dj ^ Im®(^J and [/,■ = Im ^ e,(/ = 0, 1), Im is the identity 
of size M, Co = (1,0)^ and Ci = (0, 1)^. Here M-by-M is the resolution of the 
low-resolution frame. Then we have 

1 
Sij = Dijg and g^J^^U^iJ- (5-8) 

As mentioned above, in practice, what we obtained is a shifted version of g,j, 
i.e. we have g,/-, •) = g,/- + e^, • + e^- where 0<|e;j<0.5, 0<|e^|<0.5, 
< !, y < 1 . The parameters elj and ej are called the displacement errors. As in 
(5.4) and (5.8), the observed low-resolution image g,- can be considered as the 
down-sample of h after it has passed through a filter corresponding to the 2D 
framelet filter matrix H — H{e\^ ® H{ei), where //(e) denotes the ID filter 
^[j — e, 1, 1 + e]. More precisely, we have 

g,,. = DijHh. 

Then g, can be obtained from g, via 

g,.y = Aj/fo,oh = Dij[H - (// - //,,o)]h = g,,. - Di,j{H - //o,o)h. 

Hence we have 



g = E ^'V-g-V- = E U-'-J fe.7 - ^-J^^ - ^''■o)h] = g - (// - //o,o)h. (5.9) 

Substituting (5.9) into (5.7), we arrive at the 2D image resolution enhancement 
formula: 
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Fig. 5.3 Framelet-based resolution enhancement algorithm for 2D images, see (5.10) 






(5.10) 



We depict this algorithm graphically in Fig. 5.3. 



5.3 Resolution Enhancement for Video Clips 

Video clips consist of many still frames. Each frame can be considered as per- 
turbations of its nearby frames. Therefore, we may generate higher resolution 
images of any frame in the video by exploiting the high redundancy between the 
nearby frames. More precisely, consider a sequence of frames {fk}ic^^K ^^ ^ given 
video clip, where k increases with the time when the frame ft is captured. We aim 
to improve the resolution of the reference frame /o by incorporating information 
from frames {fii}i^,Q. Without loss of generality, we can assume that/o is the low- 
resolution image at the (0,0) sensor position without any displacement error. 

An algorithm for video still enhancement is given in [7] which is an adaptation 
of the algorithm in (5.10). Basically, we have to tackle the following issues; 

1. for each frame /j., we have to estimate its sensor position and displacement 
errors with respect to /o, and 

2. it may be that not all low-resolution images at all sensor positions are available 
in the video. 

Here we recall the ways we handled these issues in [7]. 



5.3.1 Estimating the Motion Parameters 



For computational efficiency, we assume that the frames {fk}i!M ^re related to/o 
by an affine transform, i.e. 

/,(/?iX-r,)«/o(x), k^O, 
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where x are the coordinates of the pixels in the region of interest. Denote 



Rk^ - rk 



„(*) M 



-0 



-1 



Jk) Jk) 



'2 
J*) 



-0 



Jk) Jk) 
-I ''2 

Jk) Jk) 



(5.11) 



Our task is to estimate the parameters {q}j(,^o that minimize the difference 
between /t and/o, that is, 

(4*',cf'', . . .,cf ) = argmin^[/'i(/?;tx,- - r^) -/o(xy)]^ 

where T is the index set of pixels in the region of interest, which may be the entire 
image or part of the image. Many methods can be used to solve this minimization 
problem, such as the Levenberg-Marquardt iterative nonlinear minimization 
algorithm [18]. 

With Rk and Tk, we can compute the sensor position (4, ^J) with sj, s], e {0, 1} 
and the displacement errors (e|, el) for the frame fk with respect to /q. Since 
/a;(/?i.x — r^) =fii{Rit{x~ R^^^TiJ)), it can be viewed as a translation of /o with 
displacement vector —R^^^rt- Our task is to write 



R. rk = \ik- 



(5.12) 



Then, /i:(x) =fii{Ric{x — U/t)) can be considered as the low-resolution image g^,y 
with displacement errors (e'^,e^) at sensor position (4, 4)- The algorithm is as 
follows. 



Algorithm 1 (/(x), y^ s>', e*, e>') 
frame /o . 



if Jo): locate the frame / against the reference 



.90.0 



.'/i.o 



_X1E 



"1 



ii i i 



\<J^'X\ 



I !{^^.<i\ 



compute fi 



compute «i = [n + ;jj.(/i = ''i - «i 



comi>ut(.' .«!■'' = [2(/i + — J . f^ = 2(^1 — s^ 



Fig. 5.4 Resolve the horizonal displacement in Algorithm 1. First we compute the total 
displacement n. Then write fi = U[ + di where c/i 6 [—1/4,3/4). Then the sensor location 
i"' = [Idi + jj and the displacement error e"' = 2di — i-'^ 
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1 Compute [^1,^2] ^ R^^r. 

2 Letu=[L7i+iJ,L72 + 3 

3 Let [s^, s^] = [I2di + ij , I2d2 + \\], then •>', i-'' e {0, 1}. 

5/(x)^/Wx-u)). 
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2 Letu= [[ri+jj, [r2 + jj]^, then [dud2] = [ri,h] - u^ has entries in [-|,|)- 



4 Let [e\ e''] = [2di - y^ 2^2 - ^'l, then {e'l |e''| < i and (5.12) holds. 



We use Fig. 5.4 to illustrate how the algorithm resolves the displacement in the 
horizonal direction. It is similar for the vertical direction. 



5.3.2 The Video Still Enhancement Algorithm 

After passing a frame/ through Algorithm 1, we then have the (i\ ^■'')th low- 
resolution image with displacement error (e*, e'), i.e. 

But in the algorithm for image enhancement (5.10) (see also Fig. 5.3), we assume 
not one, but a complete set of low-resolution images {gi,/},-,=o ^^ every sensor 
position. To compensate for the missing low-resolution images, our idea is to 
generate them by downsampling the current high-resolution approximation h of /o 
with zero displacement error, i.e. 



^i.J = 



_ / f - DijiH - Ho,o)h, (ij) = is-\ sy) , 



Di.jHofih, 



{ij)^{^\sy). 



(5.13) 



We use an alternate direction approach to obtain the high-resolution image h. In 
the m-th outer iteration step, we let g^, ^v = f — D^. ;,.(// — //oo)h;j^', where 



D, 



J") 






(0) 



f''.f^.j 



■f-D,^Mti -fiofiih^ — I 



A- 



t..j)^t,-.i"t 



^ Aj/fo.ohS,7> 



t-^/.; 



//, 



0.0 



U.Urj, 

*s — '■^o.os— n 



Ho.i 



■Ho.i,h 



("). 






M 1 



^//a,,T.,(i/o.iiC') — • 



H^2 



■M2ah\ 



("J. 



%^ -^2,2 



(n)\ 



—*Hl^%(H2,2h'^') J 



-Ti 



(n+ll 



Fig. 5.5 Inner iteration of Algorithm 2. Here, we fix the low-resolution image g,v ,,, and 
compensate the missing ones g,j by downsampling i/o,ohJ,"'- Then we have g = X], ,^0 f^/jg/j as 
in (5.9) with g,^ substituted by (5.13). We iterate hj"' w.r.t. n until it converges; then we set 
h — h(") 
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f-" 



> rMfiB 

o 



Output h = h„ 



ComputL' 
PSNR(Z>,.^,,i/h,„.f) 



< TjOdB 
o 



Set 'iJ^Vl =1^»- 
Update h'^lj w.f.t. 
n hy algorithm 
in Fig. 1.5. 



Fig. 5.6 Outer iteration of Algorithm 2. We update the high resolution image h by a frame f with 
parameters {s',s^,e',e') 

hj°' = h,„_i. Then, we iterate h^' with respect to n as shown in Fig. 5.5, which is 



in fact a modification of Fig. 5.3. When it converges, we set k 



(0) ^ 

m+l 



h,„. Once we 



get an update h,„, we will go into the next outer iteration; see Fig. 5.6. The 
complete alternate direction algorithm is as follows. 

Algorithm 2 h ^- Update (h, f , 5^, s-', e\ e') : Update the high resolution image h 
by a frame f with parameters (y*, s^', e', e^'). 

1 Initialize ho = h, and set m — 0. 



2 If PSNR{D,.,.Hh„,f) <50dB, set h, 



(0) 



h,„, and m — m+l; otherwise, 



output h = h„,, stop. 
3 Iterate hj^' w.r.t. n until convergence (see Fig. 5.5): 



(a) g;,/ 






(0) 






(b) g = E Uijg^j- 

i.j=0 



(c)hi"+i)=Ho*.og+ E //,v^.(H,vhW). 
{iMiOfi) 



ft— 
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to COUlJllltO 

f, „1* Jl fZ fS 



— 4n 



fo-l 



PSNR(t.ffl) 



<!• 



Yit.%htviA]vn 
fails. (LLscHTcL 
frame, rctiiru. 



>;' 



Hogistrrttioii 
is succpssfuL 



Uptiatc h by | 


alf;oritlilii 


ill 


FiS. le « 


ith 


input f ^ 


ft 



Fig. 5.7 Resolution enhancement for video clips (in the dotted-lined box it is used to determine 
if ijt is good enough to update h) 



5 A Framelet-Based Algorithm for Video Enhancement 



105 



4 Set hm = h,^'' after converge, and go back to step 2. 

In Fig. 5.7, we give the algorithm for the video still enhancement. Given a 
reference frame fo, we use a sequence of 2K frames {ik]k=-K ^^^ ^^^ taken just 
before and after the reference frame fo. The step in the dotted-lined box is to 
determine if the shifted frame f^ is close enough to fo or else we discard the frame. 
In the experiments, we set p — 25dB. Initially, we estimate h by bilinear inter- 
polation on fo, and then use the new information from good frames to update h. 
The advantage of our algorithm is that based on the rule shown in the dotted-lined 
box it only chooses the good candidate frames to enhance the resolution, and there 
is no need to determine the number of frames to be used in advance. 



5.4 Video Enhancement Algorithm 

Since video streams are made up of frames, and we can now improve the reso- 
lution of each frame in a video stream, we can compose these high-resolution 
frames together to generate a higher resolution video of the given video stream. 
More precisely, we can apply our algorithm in Fig. 5.7 to the frames {fk}k=-K '^o 
enhance/o, and then apply the algorithm again to frames {/i}j^^_^^[ to enhance /i, 
etc. Then, by combining the enhanced frames, we obtain a high-resolution video. 
In this section, we test this idea for a video clip which is filmed by us by moving 
our camera over a calendar. The video clip is in .avi format with size 520 x 480, 
and can be downloaded at [19]. We first try the video still enhancement algorithm 
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Fig. 5.8 a The reference frame /eo, and b a part of /eo 
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Table 5.1 Alignment results from our algorithm 






Frame index 


{s-\sn 


{c\,y) 


/oW ^ 


«/(Rx + r) 


61 


(1,0) 


(0.119, 0.036) 


Yes 




59 


(1,0) 


(0.246, -0.066) 


Yes 




62 


(0,0) 


(0.368, 0.186) 


Yes 




58 


(0,0) 


(0.272, -0.126) 


Yes 




63 


(1,0) 


(-0.139, 0.086) 


Yes 




57 


(1,0) 


(0.334, -0.214) 


Yes 




64 


(1,0) 


(0.323, -0.194) 


Yes 




56 


(0,0) 


(-0.134, -0.381) 


Yes 




65 


(0,0) 


(-0.349, 0.164) 


Yes 




55 


(0,1) 


(0.454, 0.448) 


Yes 




66 


(0,0) 


(0.421, 0.181) 


Yes 




54 


- 


- 


No 




67 


(1,0) 


(-0.219, 0.236) 


Yes 




53 


(0,1) 


(-0.323, 0.323) 


Yes 




68 


(1,0) 


(0.313, 0.232) 


Yes 




52 


- 


- 


No 




69 


(0,0) 


(0.096, 0.204) 


Yes 




51 


(0,0) 


(0.292, -0.500) 


Yes 




70 


(0,0) 


(0.485, 0.186) 


Yes 




50 


(0,1) 


(0.062, 0.301) 


Yes 





in [7] to enhance the resolution of a frame in the video. In the seven seconds of the 
video, we choose the 60th frame /eo as our reference frame, see Fig. 5.8a. In 
Fig. 5.8b, we show a part of the reference frame /eo (the area enclosed by the box 
in Fig. 5.8 (a)) that we try to improve the resolution on. 

We let ^ = 10, that is, we use the 50th to the 70th frames to improve the 
resolution of /eo- The alignment parameters for this clip are listed in Table 5.1, 
which shows that frames /52 and/54 are discarded. Figure 5.9a gives the first guess 
of the high-resolution image of /eo by the bilinear interpolation. The result from 
our algorithm (i.e. Fig. 5.7) is shown in Fig. 5.9b. Clearly the calendar by our 
method is much clearer than that by the bilinear interpolation. Moreover some 
numbers, such as "16" and "18", which are clearly discernible now, are very 
difficult to read from the video clip or just by bilinear interpolation. 

The results clearly show that the video still enhancement algorithm (Fig. 5.7) is 
working. Next we extend it to video enhancement. Our aim is to obtain a high- 
resolution video for the image in Fig. 5.8b. We will use the video still enhance- 
ment algorithm (Fig. 5.7) repeatedly to enhance all frames in {/<: 1^=40- More 
precisely, frames {fk}t=(^io will be used to improve the resolution of frames /c, 
with 40 < £ < 63. The enhanced stills are put back together to get a higher-reso- 
lution video with 24 frames. The original clip and the resulting clips are given in 
[19]. Because the new one has higher resolution, it is 4 times in size and is much 
clearer. 
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Fig. 5.9 Reconstructed high- 
resolution images: a using 
bilinear interpolation, b using 
our algorithm in Fig. 5.7 
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5.5 Conclusion 



In this paper, we give a short survey of the framelet algorithm proposed in [7] for 
high-resolution still enhancement from video clips. We then extend it to video 
enhancement. Simulation results show that our framelet algorithm can reveal 
information that is not discernible in the original video clips or by simple inter- 
polation of any particular frame in the video. 

By modification of the motion estimation Eq. 5.11, our framelet algorithm can 
also be extended to more complicated motions. So far, we have not yet make use 
of the sparsity of the tight-frame coefficients across frames. How to make use of it 
is an interesting topic for our future work. 
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Chapter 6 

Perturbation Analysis of the Mixed-Type 

Lyapunov Equation 



Mingsong Cheng and Shufang Xu 



Abstract This paper concerns the mixed-type Lyapunov equation X = A*XB + 
B*XA + Q, where A,B, and Q are n x n complex matrices and A* the conjugate 
transpose of a matrix A. A perturbation bound for the solution to this matrix equation is 
derived, an explicit expression of the condition number is obtained, and the backward 
error of an approximate solution is evaluated by using the techniques developed in Sun 
(Linear Algebra Appl 259:183-208, 1997), Sun and Xu (Linear Algebra Appl 
362:211-228, 2003). The results are illustrated by using some numerical examples. 



6.1 Introduction 

Consider the mixed-type Lyapunov matrix equation 

X^A*XB + B*XA + Q, (6.1) 

where A,B,Q e C"^". Here C"^" denotes the set of all « x « complex matrices. A* 
the conjugate transpose of a matrix A. This kind of equation arises in Newton's 
method for solving the following matrix equation: 

X - A*X-^A = I, (6.2) 
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which are recently studied by Ivanov, Liu, etc. [2, 4, 5]. In Newton's method, we 
must solve the following equation in each iterative step: 

Xk+i + A%-2;ft+iX^-'A + A%-'Zt+iX^-2^ = / + 3A%-2^, (6.3) 

which is a mixed-type Lyapunov equation (see [1, 9] for more details). 

The matrix equation (6.1) can be regarded as a generalization of the discrete 
and continuous Lyapunov equations. In fact, when Z? = jA, the matrix equation 
(6.1) is just the discrete Lyapunov equation 

X = A*XA + Q; (6.4) 

when B — I,A — A ~ jl, it is just the continuous Lyapunov equation 

A*X + XA+Q = 0. (6.5) 

This shows why we call it mixed-type Lyapunov equation. The solvability for this 
equation has been studied in our former paper [9]. In this paper, our main purpose 
is threefold. To begin with, we derive a perturbation bound for the solution X. 
Secondly, we apply the theory of condition developed by Rice [6] to define a 
condition number of X, and moreover, we use the techniques developed in [8] to 
derive its explicit expressions. Finally, we use the techniques developed in [7] to 
evaluate the backward error of an approximate solution. 

We start with some notations which we shall use throughout this paper. Define 
the linear operator 

l.{X) ^X-A*XB-B*XA, ZgC"^", (6.6) 

then the mixed-type Lyapunov equation (6.1) can be written as L(Z) — Q. 
Throughout this paper we always assume that the mixed-type Lyapunov equation 
(6.1) has a unique solution X, i.e., the linear operator L is invertible. We use C"^" 
(or R"'^") to denote the set of complex (or real) n x n matrices. A* denotes the 

conjugate transpose of a matrix A,A^ the transpose of A, AT the Moore-Penrose 
inverse of A, and A the conjugate of A. The symbols || ■ ||, || • jjj, and || ■ ||^ denote 
a unitary invariant norm, the spectral norm, and the Frobenius norm, respectively. 

For A = [«!,..., fl„] = [fly] G C"^" and a matrix B,A® B — [aijB] is a Kronecker 
product, and vec(A) is a vector defined by vec(A) = [flf , . . ., a^] . 



6.2 Perturbation Bound 

Let X be the unique solution of the mixed-type Lyapunov equation (6.1), and let 
the coefficient matrices A,Z?, and Q be slightly perturbed to 

A=A + AA, B=B + AB, Q = Q + AQ, 



6 Perturbation Analysis of the Mixed-Type Ly apunov Equation 111 

respectively, where AA, AB, AQ G (^«xn ^^ jj^j^ section we consider perturbation 
bounds for the solution X. 

Let X =X + AX with AX e C'""' satisfy the perturbed matrix equation 

X = A*XB + B*XA + Q. (6.7) 

Subtracting (6.1) from (6.7), we have 

L(AX) = A2 + /!(AA,AB), (6.8) 

where L defined by (6.6) and 

h{AA, AB) = {AA)*XB + {AB)*XA + A*XAB + B*XAA + {AA)*XAB 

+ {AB)*XAA. 

Since the linear operator L is invertible, we can rewrite (6.8) as 

AX^L-\AQ)+L-\h{AA,AB)). (6.9) 



Define 



It follows that 



Now let 



||L"'||= max ||L"'W|| 
weC""-" 

l|w||=i 



||L-'W||<||L-'||||W||, WeC"'". (6.10) 



«=||A||, p=\\Bl S^WAQl /=||L-T\ ).==M, (6.11) 

and define 

e = a||A5||+/J||AA|| + ||AA||||AB||. (6.12) 

Then we can state the main result of this section as follows. 

Theorem 1.1 If 2e</, then the perturbed matrix equation (6.7) has a unique 
solution X such that 

Proof Let 

f{AX)='L-\AQ)+\.-\h{AA,^B)). 

Obviously, /(AX) can be regarded as a continuous mapping from C"^" to 
C"""". Now define 
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Ss. = {AX e C""' : \\AX\\<S,}. (6.14) 

Then for any AX e Ss, , we have 

|[f(AX)||<i||Ae||+i||/z(AA,AB)|| 

<y + y(||B||||AA|| + ||A||||Afi|| + ||AA||||A5||)(||X|| + ||AX||) 

d + ley + leb^ 
- I 

= 5f 

Thus we have proved that f{Ss^ ) C Ss, ■ By the Schauder fixed-point theorem, 
there exists a AX* G <Sa, such that/(AX*) — AX*, i.e., there exists a solution AX, 
to the perturbed equation (6.8) such that 

||AX*||<(5*. (6.15) 

Let X = X + AX*. Then X is a solution of the perturbed matrix equation (6.7). 
Next we prove that the perturbed matrix equation (6.7) has a unique solution. 
Define two linear operators 

H(y) = {MyYB + {AB)*YA + A*YAB + B*YAA + {AA)*YAB + (ABfYAA, 

and 

L(y) = Y-A*YB-B*YA, 

respectively, where Y e C"^". Then it is easily seen that 

L(F)=L(y)-H(y)=L(F(y)), 

where 

F(y) = y-L"'(H(y)). 

||F(F)||>|ly||-||L-'(H(y))|l 
>||y||(i-||L-'||||H||) 

which guarantees that the linear operator F is invertible under the condition 2e < /, 
and hence, the linear operator L is invertible, which implies that the perturbed 
matrix equation (6.7) has a unique solution X. Thus, the inequality (6.15) implies 
that the inequality (6.13) holds. The proof is completed. 



Note that 
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Remark 2.1 From Theorem 1.1 we get the absolute perturbation bound of first 
order for the unique solution X as follows: 

P -X|| < jllAell +^||AB|| +^||AA|| +0(||(AA,AB,AQ)f ), 

(AA,Afi,AC?) — >0. (6.16) 

Combining this with (6.9) gives 

AX = L"' (A2) + Q(AA, ^B) + 0{\\ (AA, AB, Ag) f), (AA, AB, Ag) — > 0, 

(6.17) 

where 

Q(AA,AB) =L"'((AA)*XB+(AB)*X4+A*XAB + 5*ZAA). (6.18) 

Remark 2.2 Noting that (6.1) implies that 

||G||<(1+2||A||||B||)||X||, 

from (6.16) we immediately obtain the first relative perturbation bound for the 
solution X by 



\X-X\\ ^ l+2«/?^||AA|| , ||AB|| , \\AQ\\\ 



0(||(AA,A5,AQ)f), 



ii^ii - / vii^ii iiBii iiGii; 

(AA,AB,Ae) — yQ. (6.19) 



6.3 Condition Numbers 

We now apply the theory of condition developed by Rice [6] to study condition 
numbers of the unique solution X to the mixed-type Lyapunov equation (6.1). 

Suppose that the coefficient matrices A, B, Q are slightly perturbed to A, B, Q G 
C"^", respectively, and let 

AA = A-A, AB = B-B, AQ = Q - Q. 

From Theorem 1.1 and Remark 2.1 we see that if ||(AA, AB,AQ)\\f is sufficiently 
small, then the unique solution X to the perturbed matrix equation (6.7) exists, and 

AX = X~X = L"' (Ae) + Q(AA, AB) + 6>(||(AA, AB, AQ)\\l), (6.20) 

as (AA, AB, AQ) — > 0, where Q is defined by (6.18). 

By the theory of condition developed by Rice [6] we define the condition 
number of the unique solution X by 



114 



<5^0 ,.^4 46 42,, ,: QO 

IIT'T'ITIIf - " 



c{X) — lim sup 

Ae Afiii 
• /J ' ,, IIf 
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(6.21) 



where ^, a, fS and p are positive parameters. Taking ^ — o(. = [i = p=lin (6.21) 
gives the absolute condition number Cabs(-^), and taking ^— ||X||^,a== ||A||^, 
P = II^IIfiP = IIGIIf i" (6.21) gives the relative condition number c,-ei(X). 
Substituting (6.20) into (6.21) we get 



c{X) 



|L-'(Ae)+Q(AA,Afl) 

llAA AB AQii 
II o: ' /? ' p llf 



where 



- max 

M,AB,AQeC'"'" 



1 ||L-'(G(£,M,^))||^ 

— IllaA 7—. — r , 

^ {E,M.N)^0 ||(£,M,^)||^ 

EM-NeC"" 



G{E, M, N)= pN + a{E*XB + B*XE) + P{A*XM + M*XA) . 



(6.22) 



Let L be the matrix representation of the linear operator L. Then it follows from 
(6.6) that 



L = I®1-B' 



)B*. 



Let 



L-\l(^{B*X)) = Ui +(Qi, L-\{XBf ®I)n^ U2 + i0.2, 
L-\l®{A*X)) ^Vi+iTi, L-\{XAY ®I)n = V2 + iT2, 



(6.23) 



(6.24) 



where t/i, t/2,f^i,fi2, V'l, Va, Ti, Fa G R"'^" , and n is the vec-permutation 
matrix (see [3, p. 32-34]), i.e., 



vec(£'^) =nvec(£). 



Moreover, let 



s -z 



Uc = 



Ui + U2 Q2 - ^l 

Qi + Q2 f/i - U2 



,Vc = 



Vi +V2 Tz - Fi 

Fi + Fz Vi- V2. 

(6.25) 



Then the following theorem immediately follows from (6.22). 

Theorem 1.2 The condition number c(X) defined by (6.21) has the explicit 



expression 



c{X)=l\\{aU,,pV,,pS,)\\^, 



(6.26) 



where the matrices Uc, Vc and Sc are defined by (6.23)-(6.25). 
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Remark 3.1 From Theorem 1.2 we have the relative condition number 

,^, ||(||A||^c/„||B||^y„||e||A)||2 

^-(^)^ m. ■ 



6.4 Backward Error 

Let X S C"''" be an approximation to the unique solution X of the mixed-type 
Lyapunov equation (6.2), and let AA, AB, and AQ be the corresponding pertur- 
bations of the coefficient matrices A,B, and Q in (6.1). A backward error of the 
approximate solution X can be defined by 



r]{X) =min 



:AA,AB,AeeC"^",Z-(A + AA)*X(B + A5) 



AA AB Ae^ 

{B+^B)*x{A + ^A) = Q+^Q], (6.27) 



where a, /J and p are positive parameters. Taking a = fi = p = 1 in (6.27) gives 
the absolute backward error >?abs(^)j ^"d taking a = ||A|j^,^ = ||B||^,/3 = \\Q\\p 
in (6.27) gives the relative backward error >7jei(^)- 
Let 

R = Q~X+A*XB + B*XA. (6.28) 

Then from 

X - (A + AA)*X{B + AB) -{B + AB)*X(A + AA)^Q + AQ, 
we get 

{AA)*XB + {AB)*XA + A*XAB + B*XAA + AQ 
= -/? - {AA)*XAB - {AByXAA, (6.29) 

which shows that the problem of finding an explicit expression of the backward 
error ri{X) defined by (6.27) is an optimal problem subject to a nonlinear con- 
straint. It seems to be difficult to derive an explicit expression for the backward 
error r]{X). In this section we only give some estimates for ri{X). 
Define 

vec(AA) = xi + iyi , vec(AB) = X2 + iy2, vec(AQ) = Jt3 + iyj,, 
vsc{R) = r + is, vec{{AA)*XAB + (AB)*ZAA) = a + ib, 
I(g){B*X) = t/i +(Qi, {{XBf (g)l)n = U2 + in2, 
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/ ® {A*X) = y, + iTi , {{XAf (g)l)n = V2 + iT2, 



Ur = 



f/i + U2 Q2 - fii 

Qi +0.2 Ui- U2 

Tc = {aUcjVc,pl2„2), 



Y T T T T T\ T 

x{ y{ x'2 y'2 4 ^3^ 



V. = 



Vi +V2 T2- Ti 

Fi +T2 Vi - V2 



(6.30) 



oi a. p fS p p J 
where n is the vec -permutation. Using these symbols (6.29) can be rewritten as 



T — — 



(6.31) 



rt 



Since p > 0, the 2n^ x 6n^ matrix Tc is of full row rank, and hence, TcTc — /2„2, 
which implies that every solution to the equation 



r^l r 



rill 



(6.32) 



must be a solution to the equation (6.31). Consequently, for any solution g to the 
Eq. (6.32) we have 







>7 


(X) 


<kll2- 






(6.33) 


Let 








7 = 




) , ^^\\t]\\2\ fi- 


\\x\ 


2' 


(6.34) 


and define 








««)-^jC)-^?©- 






(6.35) 


Then we have 








\\C{g)\\2<y + - ( 


b) ; 


= ;' + — 

T 


/AA\*~A5 


fAB'^ 


/ a 


F 


2aBr 

T 


I AA 
a 


Afi 


F 








oiBu 

T 


/ AA 


2 

+ 
F 


AB 

T 


2\ 








OiBu 

T 


1 l|2 












(6.36) 
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Consider the equation 



It is easy to verify that if 



then the Eq. (6.37) has the positive number 



^^y + '^e. (6.37) 

T 



7<tV' (6-38) 



(6.39) 



as its smallest positive real root. Thus, it follows from (6.36) that 

||g|l2<Ci =^ ||/:(g)|l2<Ci. (6.40) 

Therefore, by the Schauder fixed-point theorem, there exists a g* satisfying 
|§*|l2^Ci such that C{g^.) — gt, which means that g^, is a solution to the 
Eq. (6.32), and hence it follows from (6.33) that 

'?(^)< 11^*112 <Ci, (6.41) 

i.e., ^1 is an upper bound for ri{X). Next we derive a lower bound for ri{X). 
Suppose that ('M™^ M™ ^ A^^A satisfies 



nix) 






(6.42) 



Then we have 



where 



^^^-"--C -U7)' (^-43) 



flmin + ibmin = Vec{{AA,^i„)*X AB^i„ + (A5,nin)*ZAAmin) , 
JCLmin + '>l,min = Vec(AAn,in), -^l^min + iV2,min = Vec(ABn,in), 
•^3,min + '>3,min = Vec(AQmin), 

CT T T T T T \^ 

l,min .^l,min 2,min .^2,min 3,min .^3,111111 \ 
oc a [i p p p I 

Let Tc — W{D, 0)Z^ be a singular value decomposition, where W and Z are 
orthogonal matrices, D — diag((ii,cf2, • . •,<^2«0 with di> ■ ■ ■ >d2ni > 0. Substi- 
tuting this decomposition into (6.43), and letting 
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Z'gn 



v€R2'^ 



we get 



v = D-^W'^ 



Then we have 

n{x) - ||g™„||2 



D-^W'^i 


'^ 


\sj 


4') 


2 



>I|V||2 





flmin\ 




^min / 


2 



>y ~ ||rt||2||(AA,„i„)*XAB„,i„ + (A5,™„)*ZAA„i„||^ 



>y ■ 



loc^fi 



T 
T 



AA„ 



ABr, 



a ' /? 



-^P 



in which the last inequality follows from the fact that 



AAniin AB 



mm ^-*'-' mm 



iS 



< 



/AAmin AB,nin AQ„ 



\ a ' P ' p 



liX)<^i 



Let now 



cif^fi 2 at f! fit 



2yx 



(6.44) 



T ' "^1^1 + yT^^^4aM^y 

If we can prove that l{y) > 0, then (6.44) just gives a useful lower bound for ii{X). 
Therefore, we now devote to proving that l{y) > 0. Since <^i is a solution to the Eq. 
(6.37), we have 



^1 



a/^/^.2 



and hence we have 



2y\Jx^ — 4-oi.fliiyi 



T T + -y/T-^ — Aapfiyi 



>0. 



In summary, we have proved the following theorem. 
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Theorem 1.3 Let A,B,Q,X e C"^" be given matrices, t]{X) be the backward 
error defined by (6.27), and let the scalars 7, t, /i be defined by (6.34). If y<4^, 
then we have 



where 



u{y) 



Q<l{y)<n{X)<u{y) 

lyr 



■ + y/x^ — 4afSfiyt 



. iiy) = y M (y)- 



(6.45) 



(6.46) 



Remark 4.1 The functions u{y) and l{y) defined by (6.46) have the Taylor 
expansions 

u{y) = y + '^f + 0{y') 



and 



/(,)=, -% + 0(y^), 



respectively. Consequently, when 7 is sufficiently small, we have 

y--^y <n{x)<y + -^y^. 

T T 



(6.47) 



6.5 Numerical Examples 



To illustrate the results of the previous sections, in this section some simple 
examples are given, which were carried out using MATLAB 6.5 on a PC Pentium 
IV/1.7G computer, with machine epsilon e « 2.2 x 10^'^. 

Example 5.1 Consider the mixed-type Lyapunov matrix equation (6.1) with 
A = aJ,B = pj,X = 4/2 and 2 = X - A*XB - B*XA, where 



/ 



2 -1 

-1 2 



Take a ^ ft ~ 0.5, and suppose that the perturbations in the coefficient matrices 
are 



AAk = lO"'^ X 



Ag, ^ 10-*^ X 



0.901 0.402 

0.332 0.451 _ 

0.401 0.225 

0.331 -0.429 



, ABk = 10"* X 



0.778 0.231 
-0.343 0.225 
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In this case the relative condition number Crei(X) = 8.6603, which is computed by 
the formula given as in Remark 3.1. By Theorem 1.1, we can compute perturbation 
bounds 5['''': 



M 



||X-Z(*)||<^«, 



where X^*' are the solutions of the mixed-type Lyapunov equation (6.1) with the 
coefficient matrices Ai; = A + AA^, B^ — B + ABk and Qk — Q + AQ^, respec- 
tively. Some results are listed in Table 6.1. 

On the other hand, take a = 0.5 and /? = 0.9998. In this case the relative 
condition number is c,ei(Z) ~ 32396.61. This shows that the solution X is ill- 
conditioned. However, we can still compute the perturbation bounds, which are 
shown in Table 6.2. 

The results listed in Tables 6.1 and 6.2 show that the perturbation bound given 
by Theorem 1.1 is relatively sharp. 



Example 5.2 Consider the mixed-type Lyapunov equation (6.1) with the coefficient 
matrices A^\J,B^ /?^7, X^hwAQ^X- A*XB - B*XA, where 



ft = 1 - 10 



-k 



J = 



1 

1 



Suppose that the perturbations in the coefficient matrices are 



AA 



AQ 



-«'x 


"0.491 

.0.342 


0.962" 
0.471. 


-"'x 


"0.128 
0.331 


0.625 
-0.429 



^B 



10-'° X 



0.478 
0.413 



0.232 
-0.535 



Some numerical results on the relative perturbation bounds (5»/||X||, r.^ and Crei(X) 
are shown in Table 6.3, where (5* is as in (6.13), r* = ||Z — X||/||X||, and Crei(X) is 
given in Remark 3.1. The results listed in Table 6.3 show that the relative 
perturbation bound (5,/||X|| is fairly sharp, even if in the case the solution X is 
ill-conditioned. 



Table 6.1 












k 6 




7 


8 


9 


10 


r« 1.854 X 
^W 5.313 X 


10-= 
10-= 


1.854 X IQ-*^ 
5.313 X IQ-*^ 


1.854 X lO-'^ 
5.313 X 10-' 


1.854 X 10-** 
5.313 X 10-** 


1.854 X lO-** 
5.313 X 10"'' 


Table 6.2 


k 6 




7 


8 


9 


10 


r* 5.261 X 
SW 2.154 X 


10-^ 
10-' 


5.201 X 10--'' 
2.056 X 10-^ 


5.195 X 10-"* 
2.046 X 10-^ 


5.194 X 10-= 
2.045 X 10-" 


5.194 X 10-"^ 
2.045 X 10-= 
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Table 6.3 



k 


1 


2 


3 




4 






5 


5./IIXII 


2.667 X lO-*^ 


2.792 X 10"** 


2.805 


X 


10"' 2.806 X 


10" 


-6 


2.806 X 10-5 


r* 


2.416 X 10"'^ 


2.619 X 10"** 


2.640 


X 


10"^ 2.642 X 


10" 


-h 


2.642 X 10-5 


CrelW 


12.77 


140.01 


1.412 


X 


10-3 J 4J4 ^ 


10" 


-4 


1.414 X 10-5 


Table 6.4 


i 


\\X-X\\p 


"/ 






Ki) 






u{y) 


1 


0.1149 


0.0149 






0.0144 






0.0154 


3 


0.1149 X IQ-^ 


0.1525 X 


10-3 




0.1524 X 10-3 






0.1525 X 10-3 


5 


0.1149 X 10^"* 


0.1525 X 


10-5 




0.1525 X 10-5 






0.1525 X 10-5 


7 


0.1149 X 10-* 


0.1525 X 


10-^ 




0.1525 X 10-' 






0.1525 X 10-'' 


9 


0.1149 X lO"** 


0.1525 X 


10-' 




0.1525 X 10-" 






0.1525 X 10-' 



Example 5.3 Consider the mixed-type Lyapunov equation (6.1) with the coefficient 
matrices 






1 


1 


1 


-1 


1 



Q = X- A*XB - B*XA, 



where X = diag(l,2, 3),B is a 3 x 3 Hilbert matrix. Let now 



X=Z+ 10" 



be an approximate solution. Take a 



0.5 -0.1 


0.2 






-0.1 0.3 


0.6 






0.2 0.6 


-0.4_ 






^-||A||f,/i = 


= \\B\\f 


and p = 


\\Q\ 



m 



Theorem 1.3. Some numerical results on lower and upper bounds for the backward 
error r]{X) are displayed in Table 6.4. 

From the results listed in Table 6.4 we see that the backward error of X 
decreases as the error \\X — X\\p decreases, and moreover, we see that for smaller y 
(e.g., y < IQ-"^) we can get a quite better estimate for the backward error r]{X) by 
taking y as an approximation to u{y) or l{y). 



6.6 Conclusion 



In this paper we first give a perturbation bound for the solution X to the mixed-type 
Lyapunov equation (6.1). Then we derive an explicit expression of the condition 
number for the solution X to the mixed-type Lyapunov equation (6.1). Moreover, 
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we give an upper and lower bounds of the backward error for an approximate 
solution to the mixed-type Lyapunov equation (6.1). Numerical examples show 
that our estimate is fairly sharp. 
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Chapter 7 

Numerical and Symbolical Methods 

for the GCD of Several Polynomials 



Dimitrios Christou, Nicos Karcanias, Marilena Mitrouli 
and Dimitrios Triantafyllou 



Abstract The computation of the Greatest Common Divisor (GCD) of a set of 
polynomials is an important issue in computational mathematics and it is linked to 
Control Theory very strong. In this paper we present different matrix-based 
methods, which are developed for the efficient computation of the GCD of several 
polynomials. Some of these methods are naturally developed for dealing with 
numerical inaccuracies in the input data and produce meaningful approximate 
results. Therefore, we describe and compare numerically and symbolically 
methods such as the ERES, the Matrix Pencil and other resultant type methods, 
with respect to their complexity and effectiveness. The combination of numerical 
and symbolic operations suggests a new approach in software mathematical 
computations denoted as hybrid computations. This combination offers great 
advantages, especially when we are interested in finding approximate solutions. 
Finally the notion of approximate GCD is discussed and a useful criterion esti- 
mating the strength of a given approximate GCD is also developed. 
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7.1 Introduction 

The problem of finding the greatest common divisor (GCD) of a polynomial set 
has been a subject of interest for a very long time and has widespread applications. 
Since the existence of a nontrivial common divisor of polynomials is a property 
that holds for specific sets, extra care is needed in the development of efficient 
numerical algorithms calculating correctly the required GCD. Several numerical 
methods for the computation of the GCD of a set P,„,„, of m polynomials of ^\s\ of 
maximal degree n, have been proposed, [2-4, 11, 20, 24, 26, 28-30, 32, 33, 36] 
and references therein. These methods can be classified as: 

(i) Numerical methods based on Euclid's algorithm and its generalizations, 
(ii) Numerical methods based on procedures involving matrices (matrix based 
methods). 

The methods that are based on Euclid's algorithm, are designed for processing 
two polynomials and they are applied iteratively for sets of more than two poly- 
nomials. On the other hand, the matrix-based methods usually perform specific 
transformations to a matrix formed directly from the coefficients of the poly- 
nomials of the entire given set. 

The GCD has a significant role in Control Theory [15, 31]. A number of 
important invariants for Linear Systems rely on the notion of Greatest Common 
Divisor (GCD) of several polynomials. In fact, it is instrumental in defining system 
notions such as zeros, decoupling zeros, zeros at infinity or notions of Minimality 
of system representations. On the other hand. Systems and Control Methods 
provide concepts and tools, which enable the development of new computational 
procedures for GCD [16]. 

The existence of certain types and/or values of invariants and system properties 
may be classified as generic or nongeneric on a family of linear models. Numerical 
computations dealing with the derivation of an approximate value of a property, 
function, which is nongeneric on a given model set, will be called nongeneric 
computations (NGC) [16]. Computational procedures aiming at defining the 
generic value of a property, function on a given model set (if such values exists), 
will be called generic (GC). On a set of polynomials with coefficients taking values 
from a certain parameter set, the existence of GCD is nongeneric [14, 35]; 
numerical procedures that aim to produce an approximate nontrivial value by 
exploring the numerical properties of the parameter set are typical examples of 
NGC computations and approximate GCD procedures will be considered subse- 
quently. NG computations refer to both continuous and discrete type system 
invariants. The various techniques, which have been developed for the computa- 
tion of approximate solutions of GCD [19, 26] and LCM (Least Common 
Multiple) [16, 21, 22], are based on methodologies where exact properties of these 
notions are relaxed and appropriate solutions are sought using a variety of 
numerical tests. 
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The development of a methodology for robust computation of nongeneric 
algebraic invariants, or nongeneric values of generic ones, has as prerequisites: 

(a) The development of a numerical linear algebra characterization of the 
invariants, which may allow the measurement of degree of presence of the 
property on every point of the parameter set. 

(b) The development of special numerical tools, which avoid the introduction of 
additional errors. 

(c) The formulation of appropriate criteria which, allow the termination of algo- 
rithms at certain steps and the definition of meaningful approximate solutions 
to the algebraic computation problem. 

It is clear that the formulation of the algebraic problem as an equivalent 
numerical linear algebra problem, is essential in transforming concepts of alge- 
braic nature to equivalent concepts of analytic character and thus setup up the right 
framework for approximations. 

A major challenge for the control theoretic applications of the GCD is that 
frequently we have to deal with a very large number of polynomials. It is this 
requirement that makes the pairwise type approaches for GCD [1, 24, 28, 36] not 
suitable for such applications [32]. However, because of the use of the entire set of 
polynomials, matrix-based methods tend to have better performance and quite 
good numerical stability, especially in the case of large sets of polynomials [2-A, 
10, 20, 26]. 

The study of the invariance properties of the GCD [18] led to the development 
of the ERES method [26], which performs extensive row operations and shifting 
on a matrix formed directly from the coefficients of the polynomials. The ERES 
method has also introduced for the first time a systematic procedure for computing 
approximate GCDs [19] for a set of polynomials and extends the previously 
defined notion of almost zeros [17]. The notion of almost zeros is linked to the 
approximate GCD problem [19] and it is based on a relaxation of the exact notion 
of a zero. 

Another algorithm for the GCD computation based on Systems Theory and 
Matrix Pencils, was introduced in 1994, [20], and a variant using generalized 
resultant matrices was presented in 2006 [23]. 

The implementation of matrix-based methods computing the GCD in a pro- 
gramming environment often needs a careful selection of the proper arithmetic 
system. Most modern mathematical software packages use variable floating-point 
or exact symbolic arithmetic. If symbolic arithmetic is used, the results are always 
accurate, but the time of execution of the algorithms can be prohibitively high. In 
variable precision floating-point arithmetic the internal accuracy of the system can 
be determined by the user. Variable precision operations are faster and more 
economical in memory bytes than symbolic operations, but if we increase the 
number of digits of the system's accuracy, the time and memory requirements will 
also increase. An alternative approach is to combine symbolic with floating-point 
operations of enough digits of accuracy in an appropriate way. Such combination 
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will be referred as Hybrid Computations. This technique often improves the per- 
formance of the matrix based algorithms. 

In the following, we will be mainly concerned with the performance of the 
ERES, Matrix Pencil, and Resultant ERE methods in a numerical-symbolical 
computational environment. Also, a useful indicator for the quality of the GCD, 
known as the strength of an approximate GCD [19], is described. 



7.2 The ERES, Resultant ERE (RERE) and Modified RERE 
(MRERE) Methods 

In this section, we present the description of the two methods for computing the 
GCD of several polynomials using Extended Row Equivalence (ERE) [16]. Their 
corresponding algorithms are tested and compared thoroughly and representative 
examples are given in tables. 

Suppose that we have a set of m polynomials: 



VrnM = {ais),bi{s) G ^[s],i — l,2,...,m— 1 with 
n = deg{i 

with the following form: 



n = deg{a(.j)} and p — max {deg{Z7,(i')}} <n} 

1 < /<m— 1 



a{s) = a„s" + a„-is" + • • • + ais + oq 

bi{s) = bijj' + bij,_is'''^ H h bi_n_p+is + bi,„_p 

for i — 1,2, . . .,m — 1 

and suppose that there is at least one /: ii,„ ^ but bij = for j > n — p Vi, [2]. 
For any Vm.n set, we define a vector representative (vr) p (s) and a basis matrix 
Pm represented as: 

£„,('«) = [a{s),bi{s),...,b,„^i{s)]' (7.1a) 

where P,„ e af?™^ <"+'', e„{s) = [1,5, . . .,s''-i,s'']'. 

The basis matrix P,„ is formed directly from the coefficients of the polynomials 
of the set. 

Additionally, for any vector of the form: 

/= [0,...,0,fli,...,flj] edi^, ak^O 
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we define the Shifting operation 

shf: 5/z/(r') = [fli,...,fl^,0,...,0]e3ft:'' 

In the following, without loss of generality, we suppose that the GCD of a given 
set of polynomials has no zero roots. 



7.2.1 The ERES Method 

The ERES method is an iterative matrix based method, which is based on the 
properties of the GCD as an invariant of the original set of polynomials under 
extended-row-equivalence and shifting operations [18]. The algorithm of the 
ERES method [26, 27] is based on stable algebraic processes, such as Gaussian 
elimination with partial pivoting scaling, normalization and Singular Value 
Decomposition, which are applied iteratively on a basis matrix formed directly 
from the coefficients of the polynomials of the original set. Thus, the ERES 
algorithm works with all the polynomials of a given set simultaneously. The main 
target of the ERES algorithm is to reduce the number of the rows of the initial 
matrix and finally to end up to a unity rank matrix, which contains the coefficients 
of the GCD. The Singular Value Decomposition provides the ERES algorithm 
with a termination criterion. The performance of the algorithm is better [5] if we 
perform hybrid computations. The following algorithm corresponds to an imple- 
mentation of the ERES method in a hybrid computational environment. 



7.2.1.1 The ERES Algorithm 

- Form the basis matrix P,„ e 9?"^(«+i). 

- Convert the elements of P„ to a rational format: 



d(0) 



con vert (/"„, rational) 



Initialize k:= —1 



Repeat k:= k + 1 

STEP 1: Let r := the row dimension of Plf,\ 

Specify the degree dj of each polynomial row. 

Reorder matrix P„, : (i,_i < c?,-, / = 2, . . ., r. 



I{ di = d2 = ■ ■ ■ = d,- then 

e elements of pI, 
convert(P;„ , float) 



Convert the elements of P^,^ to a floating-point format: 
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Normalize the rows of /"„ using norm-2: 
Pif^ := Normalize(/'L^^) 

Compute the singular value decomposition: 

Pif ■.= V^W',I. = diag{au..;(yr}, 

ffi > (72 > • • • >ffr and W ^ b^\T ■ -T^n+l]' 

If £rrank(P;^) = 1 then 
Select the GCD vector g. 

{g :— vvj or g :— any row of pU) 
quit 
STEP 2: Scale properly matrix P^K 

Apply Gaussian elimination with partial pivoting to Pj^\ 
STEP 3: Apply shifting on every row of Pj*'. 

Delete the zero rows and columns. 
until r — 1 

The ERES algorithm produces either a single row-vector or a unity rank matrix. 
The main advantage of this algorithm is the reduction of the size of the original 
matrix during the iterations, which leads to fast data processing and low memory 
consumption. 



7.2.1.2 Complexity 

For a set of m polynomials the amount of floating point operations performed in the 
kth iteration of the algorithm depends on the size of the matrix P;„ ' . If the size of 

Pin is m' X «', the ERES algorithm requires (9(y), z — min{m' — 1, «'} operations 
for the Gaussian elimination, 0{2m'n') operations for the normalization and 
0{m'n' + n' ) for the SVD process. The first iteration is the most computationally 

expensive iteration since the initial matrix Pm has larger dimensions than any pL . 
Unless we know exactly the degree of the GCD of the set we cannot specify from 
the beginning the number of iterations required by the algorithm. Therefore, we 
cannot express a general formula for calculating the total number of operations, 
which are required by the algorithm. 



7.2.1.3 Behavior and Stability of the ERES Algorithm 

The combination of rational and numerical operations aims at the improvement of 
the stability of the ERES algorithm and the presence of good approximate solu- 
tions. The main iterative procedure of the algorithm and especially the process of 
Gaussian elimination, is entirely performed by using rational operations. With this 
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technique any additional errors from the Gaussian elimination are avoided. The 
operations during the Gaussian elimination are always performed accurately and if 
the input data are exactly known and a GCD exists, the output of the algorithm is 
produced accurately from any row of the final unity rank matrix. Obviously, 
rational operations do not reveal the presence of approximate solutions. In cases of 
sets of polynomials with inexact coefficients, the presence of an approximate 
solution relies on the proper determination of a numerical e,-rank 1 matrix for a 
specific accuracy £,. Therefore, the singular value decomposition together with the 

ik) 

normalization process of the matrix P,„ are performed by using floating-point 
operations. The polynomial that comes from the right singular vector that corre- 
sponds to the unique singular value of the last unity rank matrix, can be considered 
as a GCD approximation and represents the numerical output of the ERES 
algorithm. 

The normalization of the rows of any matrix Pm (by the Euclidean norm) does 
not introduce significant errors and in fact the following result can be proved [26]: 

Proposition 7.1 The normalization Pm of a matrix P\n G 5ft'" ^" , computed by 
the method in the k''' iteration, using floating-point arithmetic with unit round-off 
u, satisfies the properties 

P^:!^=N-P^}+Em, ||£mL< 3.003 •«'•« 

where N G ^'"''""' = diag(<ii , ^2, • • ■,d„,), di = ( plf [i, !,...,«'] j , 

i=l,...,m' the matrix accounting for the performed transformations and 
Eisi G 5ft'" ^" the error matrix. 

It is important to notice that the SVD is actually applied to a numerical copy of 
the matrix P,,, and thus the performed transformations during the SVD procedure 

ik) 

do not affect the matrix P,„ when returning to the main iterative procedure. For 
this reason, there is no accumulation of numerical errors. The only errors 
appearing are from the normalization and the singular value decomposition [8, 12] 

i F\ 1 

of the last matrix Pm and represent the total numerical error of the ERES 
algorithm. 

The combination of rational-symbolic and floating-point operations ensures the 
stability of the algorithm and gives to the ERES the characteristics of a hybrid 
computational method. 



The numerical error which occurs from the conversion between rational and floating-point data 
is close to the software accuracy. 
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7.2.2 The Resultant ERE (RERE) and Modified ERE (MRERE) 
Methods 

Another method to compute the GCD of several polynomials is to triangularize the 
generalized Sylvester matrix S [2], which has the following form: 



S = 



So 
Si 



where the block Si,i — 1, . . ., m represents the Sylvester matrix of the i-th poly- 
nomial. In [6] we have modified the huge initial generalized Sylvester matrix in 
order to take advantage of its special form and modifying the classical procedures 
(such as SVD, QR and LU factorization) we reduce the required floating point 
operations from 0{n*) to 0{rP) flops making the algorithms efficient. More spe- 
cifically the application of Householder or Gaussian transformations to the mod- 
ified generalized Sylvester matrix requires only 0{{n+p) (21og2(n) — |) + 
(n+p) {2m\og2{n) + p)) flops and in the worse case where m = n=p the 
required flops will be equal to (9(161og2(«) ■ n^) or the half flops for the LU 
factorization respectively. In practice the flops that demand the previous methods 
are less because of the linear dependent rows which are zeroed and deleted during 
the triangularization of the matrix. 



7.2.2.1 Numerical Stability 



In [6] we proved that the final error matrix in the modified QR method is 

">l°g2( 



£ = E!=i*"'£/with 



\\E\\f <(/.(«)^(V^T^(||5*||f + ||((5*))||f) 



(7.2) 



where ((5'*) ) is the last triangularized sub-matrix, (p a slowly growing function of 
n and /,( the machine precision and the final error in the modified LU method is 



log2(n) 

1=1 



(7.3) 



with 11 E \\^ < (« + p)l'°s.n|p„ II s* lU +(« + p)> II as*)') lU), where p is the 
growth factor and u the unit round off. 
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7.3 The Matrix Pencil Methods 

7.3.1 The Standard Matrix Pencil Method (SMP) 

The matrix pencil method [20, 23] is a direct method, which is based on system 
properties of the GCD. The algorithm of the SMP method uses stable processes, 
such as SVD for computing the right J\f,- and left Af/ null spaces of appropriate 
matrices. The SMP method requires the construction of the observabihty matrix 

Q{A, C) = [C',A'C', . . ., (A')*'^"''C"]' of the companion matrix A of the polyno- 
mial of maximal degree and the left null space C of a matrix Mi as we will see 
below. It is known that the computation of powers of matrices is not always stable. 
As it is shown in [23], because of the special form of the companion matrix A and 
the orthogonality of C, the powers of A and their product with C can fail only if it 
holds very special formulas between the coefficients of the polynomials ([23] 
example (8)). An alternative way is this computation to be done symbolically: 
there will be no rounding off errors and because only an inner product must be 
computed for the last column for every computation of a power of A (the other 
columns are the columns of the previous power of A left shifted), the increase of 
the computational time because of the rational representation of the coefficients 
and the symbolical computation of the products is not very considerable. In this 
manner we achieve to compute the observability matrix avoiding one of the main 
disadvantages of the SMP method (the computation of the powers of A) without 
significant surcharge of the required time. 

Another stable way to compute the null space of Q is first to reduce the pair 
(A, C) to a block Hessenberg form without computing the observability matrix. 
From the staircase algorithm [9], we take an orthogonally similar pair (H, C), such 
that: Q{A,C) = P^[B,HB, . . .,H^'''^^B], where P is an orthogonal matrix such 
that PAP^ = H. Because the matrix H has much more elements than the com- 
panion matrix A, the computation of the powers of H demands more flops than 
those of the powers of A and thus in our case it is better to use the first way for the 
computation of the null space of Q. 

The main target of the SMP algorithm is to form the GCD pencil Z{s) and 
specify any minor of maximal order, which gives the required GCD. This speci- 
fication can be done symbolically. Let Vm,d be the set of polynomials as defined in 
Sect. 7.2. 



7.3.1.1 The SMP Algorithm 

STEP 1: Compute a basis matrix M for the right nullspace Nr{Pm) using the SVD 

algorithm. 
STEP 2: Construct Mi by deleting the last row of M. 
STEP 3: Compute the matrix C such that CMi = 0. 
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STEP 4: Construct the observability matrix: Q{A, C) = [C\A'C\ . . ., {A'f'^^C']'. 
STEP 5: Compute the right nullspace W = 7V,.(g(A, C)) using the SVD algorithm. 
STEP 6: Construct the pencil Z{s) = sW —AW. Any minor of maximal order of 
Z{s) defines the GCD of set of the polynomials. 



7.3.1.2 Complexity 

The computation of the right nullspace of P„ requires 0{mn^ + ^^) flops, of the 
of the matrix C demands 0{{r — l)n^ + ^y^) flops, where r = p{Pm)- 

The computation of C such that CMi = requires 0{4fid^ + Sd^) flops 
applying the SVD algorithm to M[, where ii — d -- r + l,r — p{Pm) {*)■ The 
computation of any minor of maximal order of Z{s) can be done symbolically 
using the LU factorization. 

Totally the Standard Matrix Pencil method demands: 0{4md^ + 4d^{r — 1) + 
4fid^ + 24d^ +^d^r) flops. If the computation of Q is done symbolically the 
required flops are diminished by the flops in (*) but the computational time is 
increased slightly. 



7.3.1.3 Numerical Stability 

The Standard Matrix Pencil Method requires two SVD calls and the construction 
of the observability matrix. 

The numerical computation of the powers of A {Ay ' is in practise stable. Of 
course there are no errors in symbolically implementation of this step. Since the 
matrix {Ay ' is computed, the numerical computation of the product {A'y 'C' — 
{C{Ay ')' is stable because the matrix C is orthonormal. For the last matrix 
multiplication it holds: yZ(C(A)<'^') = C(A)<*'+£, with |l£||2 <^^Mi||C||2||A*||2 = 
cf^MilJA'^llj, where /Z(-) denotes the computed floating point number and ui is of 
order of unit round off. The computation of the minor of maximal order of Z{s) is 
done symbolically and so there are no rounding off errors. 



7.3.2 The Modified Resultant Matrix Pencil Method (MRMP) 

The modified matrix pencil method [23] is a similar with the MP method, which is 
based to the modified Sylvester matrix S*, it constructs another GCD pencil Z{s) 
and specify any minor of maximal order, which gives the required GCD. This 
specification can also be done symbolically. 
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7.3.2.1 The MRMP Algorithm 

STEP 1: Define a basis M for the right nullspace of the modified Sylvester matrix 

S* using the modified QR factorization in first phase of SVD. 
STEP 2: Define the Matrix Pencil Z{s) = sMi - Mo for the Resul-tant set, where 

M\, M2 are the matrices obtained from M by deleting the last and the 

first row of M respectively. 
STEP 3: Compute any non-zero minor determinant d{s) of Z,{s) and thus obtain 

GCD = d{s). 

7.3.2.2 Complexity 

The Modified Resultant Matrix Pencil method requires 0{{n + p) {2log2{n) — 5) + 

(« + p) (2mlog2(n) + p) + I2k{ri + p) ), where k is the number of the calls of the 
SVD-step. 



7.3.2.3 Numerical Stability 

The computed GCD is the exact GCD of a slightly disrupted set of the initial 
polynomials. The final error is E ^ Ei + E2, with ||£i||^ < (^(n)^/!!^!!^ and 
\\E\\2 < (ipl") + c{m,n) + c{m,n) ■ (p{n) ■ u) ■ u ■ \\S\\p where u is the unit round 
off error, || • ||^ the Frobenius norm and (p{n) is a slowly growing function of n [8] 
and c(rn, n) is a constant depending on m, n. 



7.3.3 Another Subspace-Based Method for Computing 
the GCD of Several Polynomials (SS) 

The subspace concept is actually very common among several methods for 
computing the GCD of many polynomials, including those we described in the 
previous sections. The SVD procedure applied to a generalized Sylvester matrix is 
the basic tool for a subspace method. A representative and rather simple algorithm, 
which approaches the GCD problem from the subspace concept, is presented in 
[30] and we shall refer to it as the SS algorithm. 

Given a set of univariate polynomials Vm.n, the first two steps of the SS 
algorithm involves the construction of an m(n + 1) x (2n + 1) generalized 
Sylvester matrix Y from the input polynomials and the computation of the left null 
space of the transposed Y' via SVD. If we denote by t/o G s)fj(2«+i)x* jj^g basis 
matrix for the computed left null space of Y' and C is the (2n + 1) x (2« + 1 — A:) 
Toeplitz matrix of a degree K polynomial with arbitrary coefficients, then the GCD 
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vector is actually the unique (up to a scalar) solution of the system U'qC — 0, [30]. 
Obviously, the degree of the GCD is ^ = colspan{C/o}- 

For the approximate GCD problem, an equivalent and more appropriate way to 
compute the GCD vector with the SS algorithm is to construct k Hankel matrices 
Ui e gfj(i+i)x(2«+i-i:)^ i^l,...,k from the columns of Uq, form the matrix U = 
[Ui,...,Uk] e ^(i'+i)xk{2n+i-k) j^jjjj compute a basis matrix Vu for the left null 
space of U by using the S VD procedure. The last column of Va, which corresponds 
to the smallest singular value (expected to be zero), contains the A: + 1 coefficients 
of the GCD. The yielded GCD can be considered as an approximate £-GCD for a 
tolerance £ equal to the machine's numerical precision. However, for a different 
tolerance e, we can select a singular value o) from the singular value decompo- 
sition of Y' such that <Jj > s -/{h, n) and o)+i < e, [7], and compute an £-GCD of 
degree k! = 2n+ \ — j ^ k. 

Although it is not mentioned in [30], the computational cost of the SS algorithm 
is dominated by the SVD of the generalized Sylvester matrix Y' , which requires 
0{2m^rP + 5rrp-rr) flops, [12]. However, the stability and the effectiveness of the 
algorithm in large sets of polynomials is not well documented in [30] and addi- 
tionally there not any reference about the total numerical error of the algorithm. 
Practically, the performance of the SS algorithm is good when using floating-point 
operations of medium-high accuracy but becomes very slow in hybrid 
computations. 



7.4 Approximate Solutions 

It is well known that, when working with inexact data in a computational envi- 
ronment with limited numerical accuracy, the outcome of a numerical algorithm is 
usually an approximation of the expected exact solution due to the accumulation of 
numerical errors. In the case of GCD algorithms, the solution produced can be 
considered either as an approximate solution of the original set of polynomials, 
within a tolerance £, or as the exact solution of a perturbed set of polynomials. The 
following definition is typical for the approximate GCD. 

Definition 7.1 Let V,n.n — {a{s),bj{s), /=l,...,m— 1} a set of univariate 
polynomials as defined in (1) and £ > a fixed numerical accuracy. An almost 
common divisor (£-divisor) of the polynomials of the set Vm^n is an exact common 
divisor of a perturbed set of polynomials T^^n — {a{s) + Afl(s), &,(.?) + Afo,(s), 
i = 1, . . .,m — 1}, where the polynomial perturbations satisfy deg{Aa(i')} < 
deg{fl(5)}, deg{A/7,(i)} < deg{ii/(i')} and 

\\^a{s)\\^ + Y,\\^b■,{s)\\'<^ (7.4) 

An approximate GCD (e-GCD) of the set Vm,n is an £-divisor of maximum degree. 
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The computation of the GCD of a set polynomials is nongeneric. Generally, in 
most GCD problems the degree of the GCD is unknown and thus a numerical 
algorithm can easily produce misleading results. An approach to this problem is to 
certify and fix a maximum degree according to appropriate theorems and tech- 
niques [11, 32] and proceed with the computation of a common divisor of this 
particular degree. The evaluation of the strength of a given approximation is 
another important issue here. 

The definition of the approximate GCD as the exact GCD of a perturbed set has 
led to the development of a general approach for defining the approximate GCD, 
evaluating the strength of approximation and finally defining the notion of the 
optimal approximate GCD as a distance problem [19]. In fact, recent results on the 
representation of the GCD [13, 19], using Toeplitz matrices and generalized 
resultants (Sylvester matrices), allow the reduction of the approximate GCD 
computation to an equivalent approximate factorization of generalized resultants. 
Specifically, for a given set of polynomials Vm^n and its GCD of degree k: 

g{s) = s' + x^s'-' + ■ ■ ■ + A,, ;.,=^o 

it holds that [13]: 

Sp=[0i|V]-3), (7.5) 

where S-p is the {mn + p) x (n + p) Sylvester matrix of the set 7^,„,„, Ok is the 
{mn + p) X k zero matrix, S-p^ is the {mn + p) x {n+ p — k) Sylvester matrix of 
the set T'JJ, „_^ of coprime polynomials, obtained from the original set 7^„, „ after 
dividing its elements by the GCD g{s) and, finally, Og is the (n+p) x (n+p) 
lower triangular Toeplitz-like matrix of the polynomial g{s). 

We now define the strength of an r-order approximate common divisor of a 
polynomial set Vm.n [19]: 

Definition 7.2 Let Vm,n and v{s) <E 5R[i'], deg{v(.s')} = r<p. The polynomial v{s) 
is an r-order approximate common divisor of Vm.n and its strength is defined as a 
solution of the following minimization problem: 

fiV,V') - mm{\\Sv - [0.|V] • I^.H;.} (7.6) 

Furthermore, v{s) is an r-order approximate GCD of P,„„ if the minimum 
corresponds to a coprime set T^JJ, „_r' o'" '^ ^ f'^H '"^'^'^ ^V- 

We prefer to use as a metric the Frobenius matrix norm [8] denoted by || • ||^, 
which relates in a direct way to the set of polynomials. However, the minimization 
problem in Definition 7.2 cannot be solved easily, because it may involve too 
many arbitrary parameters. 
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Let US have a polynomial v{s) £ di[s] of degree r given as a solution by a GCD 
algorithm. We consider it as an exact GCD of a perturbed set of polynomials V"^ ^ 
of the form: 

Kn='P.n,n ~ Q,n.n (7.7a) 

= {p'h) =Pi{s)-qi{s) : 

deg{^,(^)} <deg{A(^)},( =l,...,m} (7.7b) 

where Q„„ denotes the set of polynomial perturbations [13]. The polynomials of 
the set Q„„ have arbitrary coefficients, which pass to the polynomials of the set 
P'^„. If we use now the respective generalized resultants (Sylvester matrices) for 
each set in Eq. (7.7a), the following relation appears: 

Sp: = Sv~Sq (7.8) 

It is clear that the exact GCD of a set of polynomials yields Sq = O ^ \\Sq\\p = 0. 
Therefore, we may consider a polynomial as a good approximation of the exact 
GCD, if llSgll^ is close enough to zero. In the following, our intention is to find 
some bounds for ||5q||^. 

If we use the factorization of generalized resultants as described in (6), we will 
have: 

Sq = Sv - [0,\Sv"] ■ 0„ o 

Sq ■ o;i = Sr ■ o;' - [o,\Sr"] 

where $^' is the inverse of Oi,. It is important to notice here that V''^ contains 
arbitrary parameters. We can select specific values for these parameters such as: 

Sp ■ <D;' - [Or\S-p-] = [5('-' \On+p-r] = 5 (7.10) 

Therefore, from Eqs. (7.9) and (7.10) it follows: 

Sq • O;' - 5 
5q = S • <D, 

and, since Cond(<l)^,) = ||3>v||f HO^'H^ >« +p, [8], we conclude with the fol- 
lowing inequality: 

^If <||5Q||f<||5t||cD..||^ (7.11) 



l^v IIF 



If v{s) has the same degree as the exact GCD of the set, the properties 

^^^fI^ ^"'^ 5:=||5||^||(D„||^ (7.12) 
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characterizes the quahty of the proximity of v{s) to the exact GCD of the set Vm.n 
and we shall refer to them as the minimum and maximum strength numbers of v{s) 
respectively. 

These strength numbers are useful indicators for the evaluation of the strength 
of a given approximate GCD. More particularly, if 5,, > 1, then the strength of the 

given approximation is bad and the opposite holds if Sv<\. Normally, we prefer 
solutions with maximum strength number iSy < < 1 or better close to the numerical 
software accuracy of the system. Otherwise, we have to solve the minimization 
problem (7) to find the actual strength. The advantage is that the computation of 
the strength numbers is straightforward and can give us information about the 
strength of a GCD before we go to an optimization method. 

7.4.1 Computational Examples 

The previous methods have been applied to many sets of polynomials. The final 
results using variable floating point, symbolic and hybrid operations are presented 
in Tables 7.1, 7.2, 7.3, and 7.4. The following notation is used in the tables. 

m: the number of polynomials 

n: the maximum degree of the polynomials 

p: the second maximum degree of the polynomials 

d: the degree of the GCD 

Tol : numerical accuracy £ 

Rel: the numerical relative error 

Table 7.1 Comparison of algorithms: Tol = 10~'* 



Example 




ERES 

Hybrid 


RERE (LU) 




MRERE (LU: 


1 


RERE (QR) 






Num 


Sym 


Num 




Sym 


Num 


Sym 


I 


Dig 


16 


32 


- 


32 




- 


35 


- 




Rel 





0.50 X 10-2-+ 





0.50 X 


10-24 





0.13 X 10-2* 







Strength 





0.20 X 10-22 





0.20 X 


10-22 





0.37 X 10-23 







Time 


1.842 


0.020 


0.120 


0.081 







0.060 


0.270 




Flops 


162,140 


14,400 


- 


9,790 




- 


32,400 


- 


II 


Dig 


16 


24 


- 


25 




- 


25 


- 




Rel 





0.90 X 10-21 





0.12 X 


10-19 





0.40 X 10-2' 







Strength 





0.21 X 10-20 





0.18 X 


10-19 





0.24 X 10-2" 







Time 


1.342 


0.260 


2.190 


0.161 




1.432 


0.881 


4.513 




Flops 


112,437 


176,000 


- 


87,529 




- 


362,667 


- 


III 


Dig 


16 


18 


- 


18 




- 


19 


- 




Rel 





0.68 X 10-" 





0.68 X 


10-" 





0.13 X 10-" 







Strength 





0.32 X 10-'^ 





0.32 X 


10-16 





0.66 X 10-" 







Time 


2.794 


0.010 


0.340 


0.007 




0.234 


0.030 


0.793 




Flops 


162,140 


6,912 


- 


4,759 




- 


16,128 


- 



Example I: m = 2, n = 16, p = 14, d = 4\ Example II: m = 11, n = 17, p = 17, d - 
Example III: m = 2, n = 12, p = 12, li = 6 
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Table 7.2 Comparison of algoritlims: Tol = 10"'* 






Example MRERE(QR) MRMP 


MP 
Hybrid 


SS 


Num Sym Hybrid 


Num 



III 



Dig 
Rel 

Strength 

Time 

Flops 

Dig 

Rel 

Strength 

Time 

Flops 

Dig 

Rel 

Strength 

Time 

Flops 



35 

0.13 X \0-^^ 

0.37 X 10-23 

0.050 

19,580 

25 

0.95 X 10-21 

0.15 X 10-20 

0.381 

175,058 

19 

0.13 X 10-" 

0.66 X 10-'^ 

0.020 

9,517 





0.169 






2.178 





0.468 



25 

0.26 X 10-'' 

0.31 X 10-« 

0.050 

267,080 

20 

0.25 X 10-" 

0.56 X 10-'* 

3.395 

761,725 

19 

0.65 X 10-'2 

0.38 X 10-" 

2.835 

126,720 



27 

0.19 X 10-'3 

0.96 X 10-" 

0.060 

121,314 

21 

0.80 X 10-" 

0.33 X 10-'* 

0.861 

56,664 

21 

0.59 X 10-" 

0.28 X 10-'* 

0.290 

50,832 



25 

2.9 X 10-'2 

0.2 X 10-' 

0.985 

91,898,532 

20 

6.0 X 10-" 
5.51 X lO-"* 
5.360 

4.1 X lO"* 
20 

6.9 X 10-" 
3.35 X 10-'*< 
0.579 
24,082,500 



Example 1: m = 2, n 
Example 111: m = 2, « 



= 16, p = 14 
= 12, p= 12, 



, J = 4; Example 11: 
d=6 



i= 11, n = 17, p= 17, d= 3; 



Strength: the strength of the GCD 

Dig: the digits of software accuracy 

Time: the required time in seconds 

Flops: the required flops 

Num: numerical implementation 

Sym: symbolical implementation 

Hybrid: Hybrid implementation 

In Tables 7.1 and 7.2 the results of the following example sets are presented: 

Example I: Two polynomials of degree 16 and 14 with integer coefficients of 8 
digits and GCD degree 4, [2]. 

Example II: Eleven polynomials of degree 17 with integer coefficients of 2 
digits and GCD degree 3, [27]. 

Example III: Two polynomials of degree 12 and GCD degree 6, (example 1, 
[36]). The roots of the polynomials spread on the circles of radius 0.5 and 1.5. 

Comment: In Tables 7.1 and 7.2 the tolerance (7b/) is fixed and the digits are 
variable. In Tables 7.3 and 7.4 both tolerance and digits are fixed. 

7.5 Numerical, Symbolical and Hybrid 
Behavior of the Methods 



All the sets of polynomials have been tested numerically and symbolically. Exe- 
cuting the programs symbolically, there are no floating point errors during the 
processes and the final result is the exact GCD of the polynomials. But the time 
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required for symbolical computations is considerable. On the other hand, the 
floating-point operations and the accumulation of rounding errors force us to use 
accuracies on which the GCD is dependent. Different accuracies may lead to 
different GCDs. This is the main disadvantage of numerical methods. However, 
the required time is less than the corresponding time of symbolical methods. The 
use of the partial SVD [25, 34] can reduce the execution time of ERES, MRMP 
and SMP methods. 

Having tested thoroughly, the ERES, RERE, MRERE, SMP methods com- 
puting the GCD of several sets of polynomials, we have reached the following 
conclusions about the behavior of the methods: 

(a) The ERES method behaves very well in Hybrid mode, since it produces 
accurate results fast enough. The method becomes slower in the case of small 
sets of polynomials of high degree. 

(b) The MRERE methods behave very well in numerical mode for various kinds 
of sets of polynomials and especially in large sets of polynomials of high 
degree, but lose their speed, when using symbolic type of operations. 

(c) The RERE methods demand significantly more flops in comparison with 
MRERE methods and their complexity makes them inefficient methods. 

(d) The MRMP method demands much more time and flops than the SMP method 
without producing better results. 

(e) It seems that for large sets of linearly depended polynomials the Hybrid ERES 
and the Numerical MRERE yield better results in acceptable time limits, 
number of digits, with negligible relative error and strength number. 

A subtle point in the numerical calculations is that it is not always easy to 
specify the exact accuracy to get numerically the correct GCD. This can be 
overcome in symbolical implementation but the more polynomials and the higher 
degrees we have, the more time we need to compute the GCD. Thus, a combi- 
nation of the above implementations can lead to an improvement in the overall 
performance of the algorithms. Of course the hybrid implementation depends on 
the nature of the algorithm. Not all algorithms can be benefit from a hybrid 
environment like ERES [5]. The conversion of the data to an appropriate type 
often leads to more accurate computations and thus less numerical errors. 



Table 7.5 Comparison of 
algorithms: computation of 
the GCD of polynomials 



Method 


m 


n 


Stability 


Decision 


ERES 


Two 


High 


Stable 


Not proposed 


RERE 


Two 


High 


Stable 


Proposed 


MRERE 


Two 


High 


Stable 


Proposed 


SMP 


Two 


High 


Stable 


Not proposed 


MRMP 


Two 


High 


Stable 


Not proposed 


ERES 


Several 


High 


Stable 


Proposed 


RERE 


Several 


High 


Stable 


Not proposed 


MRERE 


Several 


High 


Stable 


Proposed 


SMP 


Several 


High 


Stable 


Proposed 


MRMP 


Several 


High 


Stable 


Not proposed 
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In the following table we compare the algorithms computing the GCD of 
polynomials. We propose the most suitable algorithm for the computation of the 
GCD of two or several polynomials according to its stability, complexity and 
required computational time. In the following table, m denotes the number of 
polynomials and n the maximum degree (Table 7.5). 



7.6 Conclusions 

According to the comparison of algorithms that we made, we conclude with the 
following: 

1. Matrix-based methods can handle many polynomials simultaneously, without 
resorting to the successive two at a time computations of the Euclidean or other 
pairwise based approaches. Although the computation of the GCD of a pair of 
polynomials, by using a Euclid-based algorithm, can be stable, it is not quite 
clear if such method can be generalised to a set of several polynomials and how 
this generalisation affects the complexity of the algorithm and the accuracy of 
the produced results. The sequential computation of the GCD in pairs often 
leads to an excessive accumulation of numerical errors especially in the case of 
large sets of polynomials and obviously create erroneous results. On the other 
hand, matrix-based methods tend to have better performance and numerical 
stability in the case of large sets of polynomials. 

2. The development of robust computational procedures for engineering type 
models always has to take into account that the models have certain accuracy 
and that it is meaningless to continue computations beyond the accuracy of the 
original data set. Therefore, it is necessary to develop proper numerical ter- 
mination criteria that allow the derivation of approximate solutions to the GCD 
computation problem [20, 26]. In [19] the definition of the approximate GCD is 
considered as a distance problem in a projective space. The new distance 
framework given for the approximate GCD provides the means for computing 
optimal solutions, as well as evaluating the strength of ad-hoc approximations 
derived from different algorithms. 

3. The combination of symbolic-numeric operations performed effectively in a 
mixture of numerical and symbolical steps can increase the performance of a 
matrix-based GCD algorithm. Generally, symbolic processing is used to 
improve on the conditioning of the input data, or to handle a numerically ill- 
conditioned subproblem, and numeric tools are used in accelerating certain 
parts of an algorithm, or in computing approximate outputs. The effective 
combination of symbolic and numerical operations depends on the nature of an 
algebraic method and the proper handling of the input data either as rational or 
floating-point numbers. Symbolic-numeric implementation is possible in soft- 
ware programming environments with symbolic-numeric arithmetic capabili- 
ties such as Maple, Mathematica, Matlab and others, which involve the efficient 
combination of exact (rational-symbolic) and numerical (floating-point) 
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operations. This combination gives a different perspective in the way to 
implement an algorithm and introduces the notion of hybrid computations. The 
nature of the ERES method allows the implementation of a programming 
algorithm that combines in an optimal setup the symbolical application of rows 
transformations and shifting, and the numerical computation of an appropriate 
termination criterion, which can provide the required approximate solutions. 
This combination highlights the hybridity of the ERES method and makes it the 
most suitable method for the computation of approximate GCDs. 
4. Most of the methods and algorithms, which were described in the previous 
section, perform singular value decomposition (svd). The necessary informa- 
tion that we need from the svd often has to do with the smallest or the greatest 
singular value (ERES, MP). Therefore a partial singular value decomposition 
algorithm [34] can be applied in order to speed up the whole process. 

The paper is focused on the development of matrix-based algorithms for the 
GCD problem of sets of several real univariate polynomials, which is considered a 
part of the fundamental problem of computing nongeneric invariants. 
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Chapter 8 

Numerical Computation of the Fixed Poles 

in Disturbance Decoupling for Descriptor 

Systems 



Delin Chu and Y. S. Hung 



Abstract In this paper the algebraic characterizations for the fixed poles in the 
disturbance decoupling problem for descriptor systems are derived. These alge- 
braic characterizations lead to a numerically reliable algorithm for computing the 
fixed poles. The algorithm can be implemented directly using existing numerical 
linear algebra tools such as LAPACK and Matlab. 



8.1 Introduction 

Consider descriptor systems of the form 

Ex{t) ^ Ax{t) + Bu{t) + Gq{t)] x(0)=jco, t>0 
y{t) = Cx{t), ^^-^^ 

where £,A G R"''",B e R"^™, G e R"""^, C £ R*''", and E is singular. The term 
q{t) represents a disturbance, which may represent modelling or measuring errors, 
noise or higher order terms in linearization. The existence and uniqueness of 
(classical) solutions to system (8.1) for sufficiently smooth input functions and 
consistent initial values is guaranteed if (£, A) is regular, i.e., if det(aZi — flA) ^ 
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for some (a,/?) e C^. The system (8.1) is said to have index at most one if the 
dimension of the largest nilpotent block in the Kronecker canonical form of {E,A) 
is at most one, see Gantmacher [1]. It is well-known that systems that are regular 
and of index at most one can be separated into purely dynamical and purely 
algebraic parts (fast and slow modes), and in theory the algebraic part can be 
eliminated to give a reduced-order standard linear time-invariant system. The 
reduction process, however, may be ill-conditioned with respect to numerical 
computation. For this reason it is preferable to use descriptor system models rather 
than turning the system into a standard linear time-invariant model. For descriptor 
systems, most numerical simulation methods work well for systems of index at 
most one, and the usual class of piecewise continuous input functions can be used. 
Also classical techniques for important control applications like stabilization, pole 
assignment or linear quadratic control can be applied, see e.g., Bunse-Gerstner 
et al. [2], and Dai [3]. If the index is larger than one, however, then impulses arise 
in the response of the system if the control is not sufficiently smooth. This restricts 
the set of admissible input functions and also impulses can arise due to the 
presence of modelling, measurement, linearization and roundoff errors in the real 
system. The usual way to deal with higher index systems in the context of control 
systems is to choose an appropriate feedback control to ensure that the closed-loop 
system is regular and of index at most one, whenever this is possible. Techniques 
for the construction of such feedbacks have been developed in Bunse-Gerstner 
et al. [2] based on orthogonal transformations, which can be implemented as 
numerically stable algorithms. 

The disturbance decoupling problem for standard linear time-invariant systems 
(i.e., systems of the form (8.1) with E — I) is well studied, see Syrmos [4], 
Willems and Commault [5], and Wonham [6]. The disturbance decoupling prob- 
lem for descriptor systems has been studied in Chu and Mehrmann [7], Ailon [8], 
Banaszuk et al. [9], and Fletcher and Asaraai [10]. When a state feedback of the 
form u = Fx is applied to system (8.1), then the closed-loop system becomes 

Ex={A + BF)x +Gd, , ^ 

' (8.2) 

z — Cx. 

Hence, the disturbance decoupling problem for system (8.1) can be stated as 
follows: 

Definition 1.1 Given descriptor system (8.1). 

(i) Disturbance decoupling problem (DDP) is solvable if there exists a matrix 
F e R'"''" such that {E,A + BE) is regular and 

C{sE^A~BFy^G = 0. (8.3) 

(ii) Disturbance decoupling problem with index requirement (DDPI) is solvable if 
there exists a matrix F e R""'" such that {E,A + BE) is regular and of index at 
most one, and (8.3) holds. 
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For standard linear time-invariant systems, it was shown in Basile and Marro 
[11], Chu [12], and Malabre and Martinez Garcia [13] that when the disturbance 
decoupling problem is solvable, the closed-loop system is bound to have some 
fixed poles after applying any disturbance decoupling state feedback. We would 
expect this property to extend to the case of descriptor systems. However, for 
descriptor systems, the fixed poles in the disturbance decoupling problem have not 
been characterized yet in the literature. 

The fixed poles are very important in the system design because they cannot be 
shifted by any disturbance decoupling feedback and therefore they are related 
directly to the stability of the closed-loop system. From a system design point of 
view, all poles of the designed system except these fixed poles should be located 
into the given stability region. Hence, the fixed poles are of high interest in 
systems and control. In this paper we will show that when the DDP or DDPI is 
solvable, there also exist fixed poles that are the closed-loop system poles, after 
applying any disturbance decoupling state feedback. Now we give the definition of 
the fixed poles in the DDP and DDPI. 

Definition 1.2 Given descriptor system (8.1). 

(i) Assume that the DDP is solvable. The set of the fixed poles for DDP is 
defined as 

where 

T ^{F (^ R"'^"| F solves the DDP}, 

and (7{E,A + BF) denotes the set of the finite eigenvalues of the pencil 
{E,A+BF). 

(ii) Assume that the DDPI is solvable. The set of the fixed poles for DDPI is 
defined as 

(7fi:^nfe3^(T{E,A+BF), 

where 

n: = {F e R"^"! F solves the DDPI}. 



We will extend the work in Chu [12] to descriptor systems and present alge- 
braic characterizations for Oy and oy,. These algebraic characterizations are 
obtained using orthogonal transformations, which give a numerically reliable 
algorithm to compute oy and Oy;. This algorithm can be implemented using existing 
numerical linear algebra tools such as LAPACK and Matlab. Furthermore, as a 
direct consequence of the algebraic characterizations, the solvability conditions for 
the DDP and DDPI with stability are obtained. To our knowledge, the present 
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paper is the first one to get algebraic characterizations and develop a numerically 
reliable algorithm for oy and aji. 

In this paper we use the following notation: 

• 5oc {M) denotes a matrix with orthogonal columns spanning the right nullspace 
of a matrix M; 

• rankj [•] {s) denotes the generic rank of a rational matrix function and rank(M) 
denotes the standard rank of a matrix M; 

• For any M,N £ R"^",(j{M,N) denotes the set of the finite eigenvalues of the 
pencil {M,N). 



8.2 Preliminaries 

In this section we give some supporting results. The following lemma character- 
izes the regular pencils with index at most one. 

Lemma 1.3 [3] Given E,A d R"^". Then the following are equivalent: 

(i) The pencil {E,A) is regular and of index at most one; 
(ii) rank([£ ASooiE)])=n; 
(iii) deg(det(5£-A)) =rank(£'). 

If the system is not regular or not of index at most one then feedback can be 
used to make the system regular (and of index at most one). Necessary and 
sufficient conditions, when this is possible, are given in the following Lemma. 

Lemma 1.4 [3] Given E,A e R"""", B e R"^'". 

(i) There exists F 6 R'">^" ^uch that {E,A + BF) is regular if and only if 

rankj [sE — A B] = n. 

(ii) There exists F G R'"^" such that {E,A + BF) is regular and of index at most 
one if and only if 

rank[£' AS^{E) B]=n. 

Let us now consider pole placement. The following lemma provides conditions 
under which the set of the finite poles of the closed-loop system can be arbitrarily 
placed. 

Lemma 1.5 [3] Given £,A G R"^" and B G R"""". 

(i) // 

rank[.j£-A B]^n, Vs G C, (8.4) 
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then there is an integer k>0 such that for any conjugate set A = {li, 
there exists a F (z R"^" satisfying that {E,A + BF) is regular and 



(ii) 7/' (8.4) is true and 



(t{e,a + bf) = a. 



rankfE ASoo{E) B] 
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(8.5) 
(8.6) 



then for any conjugate set A — {ki , . . ., li-ank(E)} there exists a F €z R'"^" such that 
(£, A + BF) is regular and of index at most one, and (8.5) holds. 

The next lemma is a simple generalization of Lemma 1.5, so its proof is 
omitted. 

Lemma 1.6 Given matrices of the forms 



sE-A = 


li 

sEi — Ai 




B = 


V 


}l2 


with l\ <li and B2 being of full row rank. 






(i) // 






rank 


sEi 


-A, Bi' 
Ai B2 


= /i 


+ I2, 


\fs<E 


c, 



then there is an integer k>Q such that that for any conjugate set A = {li , 
there exist a F G R*"^'' and a nonsingular matrix Z € R/"^'' such that 



{sE-A- BF)Z 



h 
sSn -An 




-An 




}/i 
}h 



with (£11,^11) regular and cr(£ii,.4ii) = A. 
(ii) If (8.7) is true and 



rank 



£1 Ai5oc(£i) -Bi 
A25oc(£i) B2 



= h+h, 



(8.7) 
(8.8) 



(8.9) 



then for any conjugate set A = {Ai, . . ., irank(£i)} there exist a F £ R"^'' and a 
nonsingular matrix Z £ R''^'' such that (8.8) holds, {En, Aw) is regular and of 
index at most one, and (j{Ey\, An) — A. 

The following two theorems, which characterize the solvability of the DDP and 
DDPI, are slight modifications of Theorems 6, 11 and 13 in Chu and Mehrmann 
[7], where their proofs can be found. 

Theorem 1.7 Given descriptor system (8.1). Then there exist orthogonal matrices 
t/, y € R"""" such that 
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U{sE-A)V-- 



'sEn - 


-An 


^£12- 


-A12 sEi3-A 


13 i£i4-Ai4' 


}«1 


SE21-A21 SE22- 


-A22 SE23-A23 SE24-A24 


}«2 


-A31 ^£32- 


-A32 ^£33-^33 ^£'34-^34 


}«3 


^£42 - 


- A42 ^£43 — A43 ^£44 — A44 


}n4 





5£53-A53 s£54-A54 


}«3 





s£64-A64_ 


}n6 




'Bi' 


}«i 




"Gi" 


}«1 




B2 


}«2 







}«2 


JB = 


B, 



}«3 
}«4' 


UG = 






}«4' 







}m 







}«3 







}«6 







}«6 





(8.10) 



where 



111 112 «3 W4 

cy=[o C2 C3 C4]' 



7^ «; = 2J "'■ + "3 + «6 = «, 



Gi, £21,^3 a«(i £42 fl''e of full row rank, £53 w nonsingular, and furthermore 



rank 



i£ii-Aii 5i Gi 

s£2i-A2i B2 

-A31 53 



h\ + h2 + n3, v.? e C, 



(8.11) 



raak;(s£42 — A42) = M4, rank(s£64 — A64) = «4, Vs £ C, (8-12) 

(8.13) 



rank 



.^£42 — A42 "^£43 — A43 .^£44 — A44 

^•£53 — A53 ^■£54 — A54 

.j£64 — A64 

C2 C3 C4 



= «2 + «3 + «4- 



Theorem 1.8 Given descriptor system (8.1). Assume that orthogonal matrices U 
and V have been determined such that (U{sE — A)V,UB,UG,CV) are in the 
condensed form (8.10). 

(i) The DDP is solvable if and only if the following three conditions hold: 

h6^n4, (8.14) 

«i + «2 <«i, (8.15) 
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rank„ 



sEu -All 


Bi 


SE21 — A21 


Bi 


-A31 


Bi 



"1 + "2 + «3- 



(8.16) 



(ii) The DDPI is solvable if and only if the conditions (8.14, 8.15) and the 
following two conditions hold: 



rank 



rank 



£11 
£21 



rank 



£32 
£42 



£11 Ai\S Bi 

E21 A2\S B2 

A3i5 B3 



M3 = rank(£') , 



«1 +«2 + "3, 



.17) 



(8.18) 



here, S ^ S^ 



En 
£21 



The main feature of the form (8.10) is that it is based on orthogonal transfor- 
mations, which can be implemented as numerically stable algorithms, thus guar- 
anteeing robust computation of the desired quantities, if this is possible. 
Furthermore, the conditions (8.14, 8.15, 8.17, 8.18) can be verified very easily. 
Now we show that the verification of the condition (8.16) can be done using the 
following condensed form (8.19). 

Theorem 1.9 Given descriptor system (8.1). Suppose orthogonal matrices U and 
V have been determined such that U{sE — A)V , UB, UG and CV are in the form 
(8.10). 

(i) Then there exist orthogonal matrices P and Q such that 



sEn -All" 




^■£21 - A21 


Q = 


-A31 





T] T2 T3 

.J011-(Dll S0i2-<Di2 50i3-<t'l3 

S@2l - <I'21 S022 - 't22 ^023 - ^23 

5032 - <I>32 ^033 - 033 

.J043 - O43 



5l 
52 
53 



hi 

}t4 



Vx' 


}tl 





}T2 





}T2 





}?4 



(8.19) 



where 

3 3 

2J T/ = «1 , Ti + T2 + T2 + T4 = 2J "'■; 

(=1 1=1 

*Pi and 021 are of full row rank, 032 is nonsingular, and 
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rank(502i - $21) = ^2, rank(i043 - (1)43) = T3, WseC. 
(ii) Furthermore, the condition (8.16) holds if and only ifxi\ — T3. 



(8.20) 



Proof The form 
from that 



.19) is constructed in Appendix A, and (ii) follows directly 



«1 + «2 + «3 = Tl + T2 + T2 - 

sEu -All 5i" 
rank^ ^£'21 — A21 -62 
-A31 Bi 



T4, 



Tl +T2 + T2 + T3- 



D 

In Sect. 8.3 we will characterize oy and Oy; using (t(032, $32) and (t{E^t,,A^^) in 
the forms (8.10) and (8.19). In order to prove these characterizations, we need to 
refine the form (8.19) by non-orthogonal transformations given in the following 
lemma 8.10. However, we note here that the condensed form in Lemma 1.10 is 
only required for analytically derivation of oy and oy, in Theorem 1.11 in the next 
section. When it comes to the numerical computation of oy and oy,, there is no need 
to determine the condensed form of Lemma 1.10. In other words, the use of non- 
orthogonal transformations in Lemma 1.10 is purely for the purpose of exposition, 
and has no implication on numerical reliability of the algorithm to be developed in 
the next section. 

Lemma 1.10 Assume that we have already had the condensed forms (8.10) and 
(8.19) and in (8.19) T4 = T3 (i.e., the condition (8.16) holds). Then there exist 
nonsingular matrices X and Y such that 

T2 
S&32 - <t'32 

-<i>i2 
-022 
-642 



XP 



sEn -All" 




^•£21 — A21 


QY = 


-A31 





XP 



Bi 
Bi 
B3 




^2 

^4 



}^2 
K4 



XP 



Gi 






Tl 




T3 







}T2 


?/-<i)ii -Oi3 


}Tl 


-O21 -O23 


}^2 


-641 -643 _ 


}^4 


-, 


63 


}t2 




Gi 


}?I 




G2 


K2' 


- 





}^4 





(8.21) 



where 



T2 + Ti + T3 = «1 , T2 + Ti + ^2 + ^4 



E 
(=1 



«;, 



G2 and ^4 are of full row rank, and 
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rank 



i/-<D„ 


-4)l3 


^1 


-621 


-4)23 


^2 


-641 


-4)43 


^4 



= f 1 + ^2 + U. V5 e C. 



(8.22) 



Proof See Appendix B. 



D 



8.3 Main Results 



We are ready to give the algebraic characterizations of Oy and <jfi. 

Theorem 1.11 Given descriptor system (8.1). Suppose orthogonal matrices 
U, V,P and Q and the condensed forms (8.10) and (8.19) have been determined. 



(i) Assume that DDP is solvable. Then 

O;/=ff(032,4'32)Uf7(£53,A53). 

(ii) Assume that DDPI is solvable. Then 

CJfi ^ (Tf = fT(032,<l>32) U (t(£:53,A53). 



(8.23) 
(8.24) 



Proof (i) Since DDP is solvable, the conditions (8.14, 8.15, 8.16) hold. In the 
following we first show that cr(032,<l>32) U fT(£'53,Ag3) C oy. 

Let F G R'"""' be any matrix such that {E,A + BF) is regular and C{sE -A- 
Bpy^G^O. Denote 

«i n2 n^ M4 

FV^[Fi F2 Fi F4.] 

Then, using the form (8.10), we have that 

Ml + M2 + «3 + «4 = « = rank^(5£' — A — BF) + rank^(C(5£' — A — BF)^ G) 

'sE~A~BF G' 
C 



rank„ 



«2 + "3 + «4 + «i + rankp 



SE21 -A21 -^2^1 

-A31 - BiFi 



I.e., 



«i = «i + rankp 



^•£21 - A21 - 52^1 

-A31 -B^Fi 



(8.25) 
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Compute the generalized upper triangular form of 



sEii - A21 - 52^1 
— A31 — Bt,Fi 
Demmel and Kagstrom [14]) to get orthogonal matrices U and V such that 



(see 



Hi 



n\ — n\ 



U 



sEji ~ A21 - B2F1 
— A31 — BiFi 



V 



SE21 - A21 SE21 - A21 

^£31-A31 



}«2 + h'i — ^2 



where £21 is of full row rank and sEt,\ — A31 is of full column rank for any s ^C. 
Set 

h\ n\ — h\ 

{sEu-An-B^F^)V= [sEn - Mi j£ii-A„,] 



U 



B2 
B3 



B2 
B3 



}«2 + M3 - /^2 



From (8.25) we obtain that 



or equivalently, 



Thus, 



sEu -All 
SE21 - A21 



«i ^ h] + H2 + {ni — hi), 

hi — hi + fi2- 
is square. From Theorem 1.8, «6 = «4, and so we have 



£11 
£21 



All 
A21 



U<t(£53,A53)c(£,A + BF). 



Now we derive the relationship between (t(032,<I*32) and a 
Using the form (8.19) we know for a i'o G C that 



£11 

£21 



rank 



■so^ii —All Bi 

S0E21 -A21 B2 

-A31 B, 



< rank 



i'£ii-Aii Bi 

SE21 - A21 B2 

-A31 53 



(8.26) 



All 
A21 



(8.27) 



if and only if sq € (t(032, ^7,1) ■ 
But, the property (8.11) gives that 



rank sE 



31 -^31 03 



53 1 = M2 + «3 - M2j ^■^ £ C, 



which yields that if s^) ^ a 



Ell 
E21 



All 
A21 



then 
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rank 



SqEi 


i-An fill 






S0E21 — A2I -S2 






-A31 B3. 








'sqEu -All - BiFi 


5i" 


■auk 


S0E21 -A21 - B2F1 


52 




_ -A31 - B^Fi 


53. 




sqEh -All 


^o£ii 


-All 



= rank 



^0^21 



^oA21 

sqEh 



All 


fii" 


A21 


B2 


A31 


B3. 



(ni + /ij) + («2 + "3 - fii) 

sEu -All 
«i + «2 + «3 = rank 



.v£2i - A21 B2 
-A31 B3 



So, (8.27) and (8.28) imply that ai&n^^n) C a 



En 
E21 



All 

A21 



(8.28) 



Therefore, 



(8.26) gives that cr(032, O32) U (T(Zi53, A53) C (7{E,A + BF), consequently, we have 



(7(032,032) U(t{Es3,As3) C oy. 
Next, let's consider the form (8.21) and denote 



(8.29) 



XP 

Since 12 + Ti + 13 = wi , and 
«2 = rank 



sEi2 -A12 




^£'22 - A22 


= 


5^32 - A32 _ 





.?£:32 - A32 

■J£'l2-Ai2 
j'£'22 - A22 
SE42 - A42 



}^2 

H4 



«1 



£11 Gi 
E21 



= rank 



©32 






G3 

Gi 



G2 



= ^2 



Tl 



C2, 



so, the condition (8.15) is equivalent to that ^2 ^^3. Note that (8.22) holds, by 
Lemma 1.6(i), there is an integer A: > satisfying that for any conjugate set Ai = 
{li, . . .,Xk} there exist matrices Fi,Fi and a nonsingular matrix Z such that 









Tl 


C2 


t3-C2 




r- Oil -^1^1 


-613 -^'iF," 




iZ-Ou 


-6<'3' 


-Of)- 


}tl 


-621 - 'i>2Fi 


-$23 - "Vlh 


z = 


"O21 


-'i>^3' 


-Oi? 


Hi' 


-641 - ^4^1 


-642 - ^4^3 













H4 



(8.30) 
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where 



7 0' 




' 


_<i>21 3)^3^ 




is 


/ 

a 


7 0' 




, 






\ 


'- 







is regular and 



Oil ola' 
O21 $^3' 



A, 



(8.31) 



Additionally, *J'4 and £42 are of full row rank and the condition (8.14) gives that 
C4 + «4 = (t3 — ^2) + "2j SO by Lemma 1.4, for any conjugate set A2 = 

£42 



{Xi , . . ., i^} with k — rank 



£'42 



£42 




■^4^3 


A42 + 4^4^ 


£42. 




A42 






a 
V 


'O £42' 
£42. 


' 



, there exist matrices £3, £2 such that 
is regular, of index at most one, and 



^4£3 A42 + ^4£2 
A42 



Let £2 satisfy O42 + *P4£2 = 0. Set 

£i = ([£2 £1 £3] + [0 £3])^"', £=[£1 £2 o]y^ 
Then, we have 



.32) 



U{sE~A~BF)V 



XPO 
/ 

.J032-O32 





7,, 0' 


QY 0' 






Z 


. /. 






0/ 



si-^n -^'^ 











-O21 







^23 










-»I'4£3 ^£42-A42-*P4£2 
.j£42-A42 







^•£53 -A53 




5£64 " A64 

(8.33) 



A simple calculation yields that (£, A + BF) is regular, 

a{E,A + BF) = ff(032, O32) U Ai U A2 U (t(£53, A53), 



C(s£-A-B£)"'G = 0. 



This means that 



0-/ C (t(032,1'32) U ff(£53,A53). 



(8.34) 
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Therefore, 

(i) follows directly from (8.29) and (8.34). 

(ii) Assume DDPI is solvable. Thus, the conditions (8.14, 8.15, 8.17, 8.18) hold. 
Note that if F e R"'""" solves the DDPI then it also solves the DDF, so. 



ff(032,<t'32) U ^(£53,^53) = 0/ C Gfl. 



(8.35) 



Moreover, 
• the condition (8.18) gives that 



$23 ^2 
<i>43 ^4 



Fi,Ft, in (8.31) can be chosen such that 

index at most one, and (8.31) is true; 

• it follows from the form (8.21) that rank 

• because 



rank 



= S,2 + ^A, thus, by Lemma 1.6 (ii), 
is regular, of 



/ 




•ill Ols^ 



$,, O 



23 



£11 

Ei\ 



T2 + Ti; 



£11 


£12 


Gi" 




£21 




£22 
£32 






= rank 





£42 








©32 £32 G3 

/ £12 Gi 

£22 G2 

£42 

£42 



rank 



£11 G\ 

£21 



= rank 



032 


63' 





/ Gi 





G2 



and Gi, £21,62 are of full row rank, ©32 is nonsingular, we have 



rank 



£32 

£42 



= rank 



£42 
£42 



Hence, the F in (8.32) satisfies that 

deg(det(.j£ - A -BF)) = deg(det(s032 - O32)) + deg ( det 



-deg det 



-O21 
"VaF^ sEa2-Aa2-'¥aF2 
^£42 — A42 

deg(det(5£53 -A53)) +deg(det(.s£64 -Am)) 



-m 



-4) 



1) 



13 
^23 



T2 + T1 +rank 


£42 
.£42 


+ n 


rank 


■£11' 

£21. 


+ r 


ank 


£32" 
£42, 



: rank(£) . 
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Hence, {E,A + BF) is additionally of index at most one. This means that F 
solves the DDPI, which and (8.33) give that 

ff/,- C (t(032,O32) UfT(£53,A53). (8.36) 

Therefore, (ii) follows from (8.35) and (8.36). D 

Theorem 1.11 implies that when the DDPI is solvable, the fixed poles in the 
DDF and DDPI are the same, although the index requirement in the DDP is not 
imposed. 

As a direct consequence of the proof of Theorem 1.11 we can obtain solvability 
conditions for the DDP and DDPI with stability. 

Corollary 1.12 Given descriptor system (8.1). Assume that orthogonal matrices 
U, V,P and Q and the condensed forms (8.10) and (8.19) have been determined. 
Let C denote the open left half complex plane. 

(i) The DDP with stability is solvable, i.e., there exists a F d |jmx« ^j^^/j fi^^f 
{E,A + BF) is regular, stable and (8.3) holds if and only if the conditions 
(8.14, 8.15, 8.16) hold and Oy C C". 
(ii) The DDPI with stability is solvable, i.e., there exists a F Cz R™ x n such that 
{E,A + BF) is regular, of index at most one, stable and (8.3) holds if and 
only if the conditions (8.14, 8.15, 8.17, 8.18) hold and Of, C C^. 

Based on Theorems 1.8 and 1.11, we can compute Oy and oy, via the following 
algorithm. 

Algorithm 3 

Input: A e R"''",B e R"''™, C S R*''", G S R"''^. 

Output: Of or af, (if possible). 

Step 1 Compute the condensed form (8.10) (see Chu and Mehrmann [7]). If 
(8.14) or (8.15) is not true, print "DDP and DDPI are not solvable" and stop. 
Otherwise, continue. 

Step 2 Check (8.17) and (8.18). If (8.17) or (8.18) is not true, print "DDPI is not 
solvable". 

Step 3 Perform Algorithm 4 in Appendix A to compute the condensed form 
(8.19). If T4 ^ T3, print DDP and DDPI are not solvable" and stop. Otherwise, 
continue. 

Step 4 Compute O'(032,<1>32) and a{Esi,AsT,). 

Step 5 If DDPI is solvable, set cr/ = Oy; = (t(032, $32) U (7(£'53,A53), and 
output (jf and Oy;. If only DDP is solvable, set aj- — cr(032, $32) U (7(£53,A53), and 

output (Jf. 

Note that Algorithm 3 is only based on orthogonal transformations, hence, it 
can be implemented in a numerically reliable manner via the existing numerical 
algebra tools such as LAPACK and Matlab. 

Now we present an example to illustrate the use of Algorithm 3. All calcula- 
tions were carried out in MATLAB 5.0 with four decimal places display on a HP 
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712/80 workstation with IEEE standard (machine accuracy e = 10"'^). Since we 
are only interested in the solvability conditions (8.14-8.18) and the eigenvalues of 
the pencils ^©32 — $32 and SE53 — A53 , so we do not store orthogonal matrices 
U, V,P and Q in the forms (8.10) and (8.19). 



Example 1.13 Given descriptor system (8.1) with 

■-0.7863 0.4021 0.2314 -0.2600 -0.7066 -0.1875 -1.1030' 

-0.1538 -0.2767 0.1409 -0.0918 0.0615 0.1701 -0.2873 

2.0846 -1.7034 -0.5839 0.1624 0.8925 0.1831 0.5748 

1.0141 -1.2203 -0.0836 -0.0557 0.5309 0.5997 0.1480 

-0.0925 -0.0113 0.0298 0.0278 0.1035 -0.0039 0.0718 

1.0492 -1.6596 -0.2188 0.1829 1.2403 0.1666 0.4429 

-0.0987 0.1941 0.0184 0.0202 -0.0431 0.0044 0.1207 

■ 6.3555 -4.5744 -1.8617 2.0120 6.1926 1.8761 7.2305 

0.56320 2.0114 -1.2514 1.2150 0.1732 -0.60900 2.4890 

-14.319 11.934 3.8888 -1.5240 -6.5104 -1.7185 -3.8026 

-7.2305 8.1554 0.68420 -0.13180 -3.7048 -4.2523 -1.2279 

0.97910 -0.12010 -0.09360 -0.36710 -0.84600 0.04080 -0.45740 

-77576 12.035 1.1888 -1.5432 -8.9678 -1.0801 -2.8889 

0.25280 -0.61880 -0.17930 0.15880 0.17470 0.51260 -0.40620 



B~ 



C- 



0.0655 ■ 

0.1149 

-0.0266 

-0.0809 

0.0932 

-0.0935 

0.0549 

-0.2196 -0.5108 0.0312 0.3180 0.2294 0.1753 -0.2892 

0.0506 0.0161 -0.1879 0.4957-0.3728 0.2024-0.2401 

Our purpose is to check if DDP and DDPl are solvable, and if so, to compute oy 



0.4937 -0.2152' 
0.1761 -0.2545 
-0.3581 -0.0736 
-0.1451 -0.2297 
0.0116 -0.0491 
-0.0759 -0.5923 
-0.0460 0.1003 



and a 



'fi- 



First, we compute the condensed form (8.10). The computed {UEV, UAV, UB, 
UG, CV) and the parameters ni,hi,i = 1, . . .,4, and h(, are as follows: 
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UEV 



UAV = 



UB = 



CV 



' -1.2184 


-0.6681 


0.2716 


-0.2559 


-0.8668 


-1.6301 


-0.8093 " 


-0.6934 
-1.4498 


-0.3802 
-0.7950 


0.1545 
0.3231 


-0.1456 
-0.3045 


-0.4933 
-1.0315 
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3, n2 = 2, 



0.6457 
0.5361 



0.4017 
-0.4837 



ns = n4 = 1, hi = 1, 








n2 = 2, 



ns = n4 = Jig = 1. 



A simple calculation gives that the conditions (8.14, 8.15, 8.17, 8.18) hold. Hence, 
by Theorem 1.8, the DDPI is solvable (and therefore DDP is also solvable). 
Then we compute the condensed form (8.19) as follows: 



sEn - An 

SE21 - A21 

-A21 



Q = 



- 0.4364 
0.4338 


1.4954,s + 10.748 
0.7735s + 5.6018 
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Bi 



= P 
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0.3740 
0.1918 
0.4143 
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r2 = T'3 = 1, 



0.1578 

0.0809 

0.1748 
-0.6860 
n = 2, f2 = 0, fi = 1. 



■ -0.6573 



0.0785 - 
-0.7256 








L 


J 



Finally, 

J032 - O32 = 0.06485 + 0.02070, SE53 - A53 = 0.7266s + 4.3489, 
by Theorem 1.11, we have 

(jf! ^Gf^ ff(032,O32) U ff(£53,A53) = {-0.31944,-5.9853} c C". 
Hence, both DDP and DDPI with stability are solvable. 
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In this paper we have studied the fixed poles in disturbance decoupling for 
descriptor systems. Algebraic characterizations for these fixed poles are derived 
based on two condensed forms under orthogonal transformations. These algebraic 
characterizations lead to a numerically reliable algorithm for computing the fixed 
poles. This algorithm can be implemented directly using existing numerical linear 
algebra tools such as LAPACK and Matlab. 



Appendix A: Proof of Theorem 1.9(i) 

Proof We prove Theorem (i) constructively by means of the following algorithm. 

Algorithm 4 

Input: Input £11, £'21,^11,^21,^31,^1,52 and Bt, in the form (8.10). 
Output: Orthogonal matrices P and Q and the form (8.19). 

Step 1 Perform a QR factorization of 



that 



Bi 
Bi 
B3 



where *Pi being of full row rank. Set 



sEu -All 




SE21 -A21 


=: 


-A31 





B2 

Bi 



^1 




s@\ 



to get orthogonal matrix P\ such 



}?i 

}4" ' 



O 



1) 



j0 



21 



11 
^21 



[^1 
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Step 2 Compute the generalized upper triangular form of s@2\ ~ ^i\ (see 
Demmel and Kagstrom [14]) to get orthogonal matrices Pi and Q such that 



pAs&^}-<^^Aq. 



%l T2 T3 

S@2\ - ^2\ S@21 - O22 ■5023 - <t23 

S&i2 - 032 5033 - 1'33 

5043 - 1'43 



}?2 

}T2' 

}t4 



where 02i is of full row rank, 032 is nonsingular, and the property (8.20) 
holds. Set 
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Tl 



t2 



T3 



{s&['^ - ^['^)Q ^■. [s@u - ^n .5012 -$12 ^013 -Ob], 



and 



Pi 



Then P and Q give the form (8.19). 



Appendix B: Proof of Lemma 1.10 

Proof Since 032 is nonsingular and rank(.s'043 — O43) = 13 for any s G C, by 
matrix pencil theory (see Gantmacher [1]), there exist matrices A'34 and 3^23 such 
that 
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Note that T3 = 14, so if we let 
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Because 
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Bi 






is of full row rank, so. 



©11 ©13 
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Gi 
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G4 



is also 



of full row rank. Thus, there exist nonsingular matrices Z22 and Y22 such that 
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where G2 and ^4 are of full row rank. Considering that 



and 



.©n-(l)„ % 
..?©2i-4'2i 

.s©43 — $43 are of full row rank for any .s e C (note that 13 = 14), so (8.22) holds 
Therefore, Lemma 1.10 is proved with 
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Chapter 9 

Robust Control of Discrete Linear 
Repetitive Processes with Parameter 
Varying Uncertainty 

Btazej Cichy, Krzysztof Gatkowski, Eric Rogers and Anton Kummert 



Abstract Repetitive processes propagate information in two independent direc- 
tions where the duration of one of these is infinite. 

They pose control problems that cannot be solved by application of results for 
other classes of 2D systems. This paper develops robust controller design algo- 
rithms for discrete linear processes based on the poly-quadratic stability that 
produce less conservative results than currently available alternatives. 



9.1 Introduction 

Repetitive processes are a distinct class of 2D systems of both system theoretic and 
applications interest whose unique characteristic is a series of sweeps, termed passes, 
through a set of dynamics defined over a fixed finite duration known as the pass 
length. On each pass an output, termed the pass profile, is produced which acts as a 
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forcing function on, and hence contributes to, the dynamics of the next pass profile. 
This, in turn, leads to the unique control problem in that the output sequence of pass 
profiles can contain oscillations whose amplitude in the pass-to-pass direction. 

To introduce a formal definition, let the integer a < + cx) denote the number of 
samples resulting from sampling the assumed constant pass length at a constant 
rate. Then in a repetitive process the pass profile yk{p),O<p<0(.— 1, generated on 
pass k acts as a forcing function on, and hence contributes to, the dynamics of the 
next pass profile y^.+i (/?) , < p < a — 1 , A: > 0. 

Physical examples of these processes include long-wall coal cutting and metal 
rolling operations [17]. Also in recent years applications have arisen where 
adopting a repetitive process setting for analysis has distinct advantages over 
alternatives. An example of these algorithmic applications is iterative algorithms 
for solving nonlinear dynamic optimal stabilization problems based on the maxi- 
mum principle [16] where use of the repetitive process setting provides the basis for 
the development of highly reliable and efficient iterative solution algorithms. 
A second example is iterative learning control schemes which form one approach to 
controlling systems operating in a repetitive (or pass-to-pass) mode with the 
requirement that a reference trajectory r{p) defined over a finite interval 
<p < a — 1 is followed to a high precision — see, for example, [11]. In this case a 
repetitive process setting for analysis provides a stability theory which, unlike many 
alternatives, allows for design to meet pass-to-pass error convergence and control 
of the along the pass dynamics. Also iterative learning control laws designed in this 
setting have been experimentally verified [10] on a gantry robot with very good 
correlation between simulation and actually measured performance. 

Attempts to analyze repetitive processes using standard (or ID) systems theory/ 
algorithms fail (except in a few very restrictive special cases) precisely because 
such an approach ignores their inherent 2D systems structure, i.e. information 
propagation occurs from pass-to-pass and along a given pass. Also the initial 
conditions are reset before the start of each new pass and the structure of these can 
be somewhat complex. For example, if the pass state initial vector is an explicit 
function of the pass profile vector at points along the previous pass then this alone 
can destroy the most basic performance specification of stability. In seeking a 
rigorous foundation on which to develop a control/estimation/filtering theory for 
these processes, it is natural to attempt to exploit structural links which exist with 
other classes of 2D linear systems. 

The case of 2D discrete linear systems recursive in the positive quadrant (ij) : 
i > 0, j > (where / and j denote the directions of information propagation) has 
been the subject of much research effort over the years using, in the main, the well 
known Roesser and Fornasini Marchesini state-space models (for details on these 
see, for example, the references given in [17]). More recently, productive research 
has been reported on TCoo and Hi approaches to filtering and control law design — 
see, for example, [3, 19]. (Filtering of this general form is, of course, well 
established in ID linear systems theory, see, for example, [8, 13, 18]). 

As noted above, the structure of the boundary conditions for linear repetitive 
processes can cause problems which have no Roesser or Fornasini Marchesini 
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State-space model counterparts. Moreover, there are key systems theoretic prop- 
erties for repetitive processes that have no interpretation in terms of these (and 
other) 2D systems models. An example here is pass profile controllability [17] that 
is the physically motivated requirement that there exists a control input sequence 
which will force the process to produce a pre-specified pass profile on a given pass. 
This means that the systems theory for other classes of 2D discrete linear systems 
is very often not applicable to repetitive processes. 

As noted above, material, or metal, rolling is one of a number of physically 
based problems which can be modeled as a linear repetitive process [17]. In this 
paper, we use a material rolling process as a basis to illustrate the solution we 
develop to a currently open robust stability and stabilization problem for discrete 
linear repetitive processes. The design itself can be completed using Linear Matrix 
Inequalities (LMIs) [4, 12]. 

In physical application terms, the system or process parameters are most often 
not known exactly and only some nominal values or admissible intervals are 
available. Hence, although the nominal process is most often time invariant, 
the uncertain process can be time-varying. This will be the case here and to solve 
the problem we generalize previously reported LMI based design algorithms for 
uncertain discrete linear repetitive processes [5] and other classes of systems 
[2, 9, 15] with polytopic uncertainty. These results are based on sufficient, but not 
necessary, stability conditions. 

The use of sufficient but not necessary stability conditions obviously creates a 
potentially serious problem since the results obtained can be very conservative in 
the sense that the range over which the admissible parameters can vary is very 
small. Here we develop substantial new results on how this problem can be 
overcome. The essential mechanism used is to allow a control law where the 
entries in the defining matrices explicitly depend on the pass number k, and the 
along the pass variable p,k>0 and </? < a — 1. This is a form of adaptation 
which allows the control law to follow or track the evolution of any uncertainty 
present in the model used for control law design and evaluation. In addition to 
reducing the conservativeness of the design, this will also allow the admissible 
uncertainty set over which a satisfactory solution exists to be enlarged. 

Throughout this chapter, the null matrix and the identity matrix with appro- 
priate dimensions are denoted by and /, respectively. Moreover, a real symmetric 
positive definite matrix, say A'^, is denoted hy N )^ 0. Next we describe the mod- 
eling of the material rolling process used in this paper. 



9.2 ]VIaterial Rolling as a Linear Repetitive Process 

Material rolling is an extremely common industrial process where, in essence, 
deformation of the workpiece takes place between two rolls with parallel axes 
revolving in opposite directions. Consider also the following differential equation 
model of the metal rolling process shown schematically in Fig. 9.1 whose 
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Fig. 9.1 Schematic of 
material rolling 




metal strip 



yk-i(t) 



derivation is explained fully in [1] (see also [7] for other repetitive process based 
analysis of this model) 



yk{t) + aoykit) + b2yk-\{t) + boyk-\{t) = coM;(f) 



(9.1) 



where 
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A\A2 

M(li + 12) 



bi^ 



h + h ' 



h,= 



-Al A 



\Al 



M(li + A2) ' 
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and Uk(i) — Fu^kif)- Here >'j:+i(f) and>',t(/) denote the thickness of the material on 
two successive passes, M is the lumped mass of the roll-gap adjusting mechanism, 
k\ is the stiffness of the adjustment mechanism springs and /!,2 is the hardness of 
the material strip, and F^.k^f) is the force developed by the motor. 

Applying the backward Euler discretization method with sampling period T to 
(9.1) leads to the following state-space model which is a special case of that for 
discrete linear repetitive processes 



■«*+i (p + 1) = ^k+\ (p) + Buk+\{p) + Boykip) 
yk+iip) = Cxk+iip) + Duk+i{p) +Doyk{p) 
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and 



Xk(p) 



yk+\{p~ ^) + b2yk{p~ 1) 

{yk+i{p - 1) + hykip - 1) - yk+i{p - 2) - b2yk{p - 2))t- 



In the general case on pass A:, Xk+\ (p) is the n x 1 current pass state vector yk{p) is 
the m X 1 pass profile vector and uidp) is the / x 1 current pass input vector. 

To complete the process description it is necessary to specify the initial, or 
boundary, conditions, that is, the pass state initial vector sequence and the initial 
pass profile. Here these are taken to be of the form 

■«*+i(0) = 4+1, ^>0 , . 

yo{p)=f{p): o<p<a-i ^^-^^ 

where 4+i is an n x 1 vector with constant entries and f{p) is an m x 1 vector 
whose entries are known functions of p. 



9.3 Stability and Stabilization of Discrete Linear 
Repetitive Processes 

The stability theory [17] for linear repetitive processes is based on an abstract 
model in a Banach space setting that includes a wide range of such processes as 
special cases, including those described by (9.2) and (9.3). In terms of their 
dynamics it is the pass-to-pass coupling (noting again their unique feature) which 
is critical. This is of the form yi^+i = L^yi, where yit S £„ (£„ a Banach space with 
norm || • ||) and L^ is a bounded linear operator mapping £„_ into itself. (In the case 
of processes described by (9.2) and (9.3), L^ is a discrete linear systems convo- 
lution operator.) 

Asymptotic stability, i.e. bounded-input bounded-output (BIBO) stability over 
the fixed finite pass length a > 0, requires the existence of finite real scalars 
My. > and 1„ e (0, 1) such that ||L*|| <M^l^,;t> 0, where || - || also denotes the 
induced operator norm. For the processes described by (9.2) and (9.3) it has been 
shown elsewhere (see, for example. Chap. 3 of Rogers et al. [17]) that this 
property holds if, and only if, all eigenvalues of the matrix Do have modulus 
strictly less than unity — written here as r{Do) < 1 where r(-) denotes the spectral 
radius of its matrix argument. 

Suppose that r(Do) < 1 and the input sequence applied {Mt+i}^, converges 
strongly as fe ^ oo (i.e. in the sense of the norm on the underlying function space) 
to Moc- Then the strong limit joc '■— limi^ocVyt is termed the limit profile corre- 
sponding to this input sequence and its dynamics are described by 
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Xaoip + 1) - (A + Boil - Dq)-'C)x^(p) 

y^{p) = {I-D,)-'Cx^{p) (9.4) 

+ (/-A))"'d«oc(p) 

JCoc(O) =(ioo, 

where (again a strong limit) dao '■= linii^oc dk. In physical terms, this result states 
that under asymptotic stability the repetitive dynamics can, after a "sufficiently 
large" number of passes have elapsed, be replaced by those of a ID discrete linear 
system. In particular, this property demands that the amplifying properties of the 
coupling between successive pass profiles are completely damped out after a 
sufficiently large number of passes have elapsed. This fact has clear implications in 
terms of the control of these processes — see [17] for a detailed treatment of this 
point. 

The finite pass length means that the limit profile can have unacceptable 
along the pass dynamics and, in particular, be unstable in the ID linear systems 
sense. A simple example here is the case when A = —0.5,5 — l,B() — 0.5 + fS, 
C = l,D = D(i = 0, where /J > is a real scalar. Hence if |/J| > 1 the limit profile is 
unstable. 

If we wish to avoid cases such as this example from arising, one route is to 
demand the BIBO stability property for any possible value of the pass length 
(mathematically this can be analyzed by letting a — > oo). This is the stability along 
the pass property which requires the existence of finite real scalars M^ > and 
''■oo € (0, 1) that are independent of a and are such that ||L^|| <Mr^l^,k>0. 

Numerous sets of necessary and sufficient conditions for stability along the pass 
of (9.2) and (9.3) are known but here it is the following result which is the starting 
point. 

Theorem 1.1 [6]. A discrete linear repetitive process described by (9.2) and (9.3) 
is stable along the pass if there exist matrices Wi ;^ and W2 )^ such that 

TJWA -W <0 (9.5) 

where W — diagjWi, W2} >~ 0, and 



A Bo 
C Do 



(9.6) 



Even though this condition is sufficient but not necessary it forms a basis of 
control law design via a Lyapunov function interpretation. In particular, follow 
[14] and introduce the candidate Lyapunov function as 

V{k,p)^xl^^(j,)WiXk+i{p)+yl(j,)W2yk{p) (9.7) 

where Wi )^ 0,W2 )^ 0, with associated increment 
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AV{k,p)=xl^^{p+l)WiXk+,{p+l) + yl+i{p)W2yk+iip) 



^k+\ 



Then it is easy to show that 



{p)WiXk+i{p) - yl{p)W2yk(p) 



AV{k,p)<0 



(9.8) 



(9.9) 



is equivalent to (9.5). 

An extensively analyzed control law for processes described by (9.2) and (9.3) 
(see, for example, [14]) has the following form over 0<p<a— 1,A:>0 



Uk+i{p) = [^1 K2] 



Xk+i{p) 

yk{p) 



(9.10) 



where Ki and K2 are appropriately dimensioned matrices to be designed. In effect, 
this control law is composed of a weighted sum of current pass state feedback and 
feedforward of the previous pass profile. 

The LMI of (9.5) extends in a natural manner to the design of (9.10) for 
stability along the pass but here we will use the approach based on [2, 9, 15] and 
first adopted for repetitive processes in [6]. This will prove to be of particular use 
in the analysis of the case when there is uncertainty associated with the process 
state-space model. 

Theorem 1.2 Suppose that a control law of the form (9. 10) is applied to a discrete 
linear repetitive process described by (9.2) and (9.3). Then the resulting process 
is stable along the pass if there exist matrices W — diagjVKi, W2\ ,W\ >- Q^W2 >- 0, 
G, and 



N = 



Ni 



N_2 

N2 



(9.11) 



such that 



-G-G^+W {AG + BNf 
AG+BN -W 



<0 



(9.12) 



If this condition holds, stabilizing Ki and K2 in the control law (9.10) are given by 

K^NG-^ (9.13) 

where matrices K and B are given by 



K 



K\ K2 
Ki K2 



B 



B 
D 



(9.14) 
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In implementation terms, this control law requires that all elements of the 
current pass state vector are available for measurement. If this is not true then an 
observer will required. 

Remark 1 The LMIs here are very similar to those known from the literature (see, 
for example, [2, 9, 15]) for ID linear systems. This does not, of course, mean that 
repetitive processes can be analyzed by direct application of existing ID linear 
systems theory, merely that in some cases recourse can be made to tools from this 
latter area. Even then, there are two major differences. The first of these is that the 
decision matrices must be block-diagonal where the diagonal entries are of 
dimensions n y. n and my. m respectively (the first corresponds to the current pass 
state vector and the second the previous pass profile — see the form of the 
Lyapunov function). Secondly, even with no uncertainty, the Lyapunov based 
stability analysis for linear repetitive processes only gives a sufficient condition 
and hence there is always some conservativeness present. 



9.4 Robust Stability and Stabilization of Discrete Linear 
Repetitive Processes 

In addition to Theorem 1 .2 of the previous section, the design of control laws for 
discrete linear repetitive processes has been the subject of much research effort — 
see, for example, [5, 6, 7,14]. Here, we continue the development of this general 
area by establishing new results related to the practical case where there is possibly 
large uncertainty associated with the process (state-space model) description. In 
particular, we consider the case when the model matrices A and B defined by (9.6) 
and (9.14) respectively are not precisely known, but belong to a convex bounded 
(polytope type) uncertain domain denoted here by T). This, in turn, means that any 
uncertain matrix can be written as a convex combination of the vertices of T) as 
follows 



v = \ \Am,p)),Bm,P))\ ■■ [A{m,p)),Bm,p))] ^Y^^'^^^p) 



Ai,Bi 



J2Uk,p) = hUk,p)>0,k>Q,0<p<^-l\ (9.15) 

where v denotes the number of vertices. Note also that the uncertainty here is 
variable in both independent directions of information propagation, i.e. along the 
pass (p direction) and pass-to-pass (k direction). 

At this stage, we can write the following linear parameter dependent state- 
space model describing the process dynamics 
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Xk+i{p + I) ^ A{i{k,p))xi,+i{p) + B{^{k,p))uk+i{p) + BQ{i{k,p))yi,{p) 
yk+i{p) = C{^{k,p))xu+M + D{^{k,p))uk+x{p) + DQ{c{k,p))yk{p) 

Consider also the parameterized candidate Lyapunov function 

V{k,pA{k,p)) ^ xl^,{p)W,{i{k,p))x,+,{jy) +yl{p)W2m,p))yu{p) (9.17) 
with 

m(^(^,p)) = ^C,(fc,p)W',i 

'f (9.18) 

W2(<^(-fe,/7)) = ^C,(^,p)W',-2 

and V(0, 0, (^(0, 0)) <oo, and associated increment 

mk,p,aKp))^xi^,{p+\)Wim,p+\))xu+^{p+i) 

+ yl+,{pW2i^(k+ \,p))yk+i{p) - xl^,{p)W,m,p))xk+,{p) 

-ylip)W2m,p))yk{p) 

(9.19) 

Then we can define so-called poly-quadratic stability (see [2, 9, 15] for the ID 
systems case) as follows. 

Definition 1.3 A discrete linear repetitive process described by (9.2) and (9.3) 
with uncertainty defined by (9.15) and Lyapunov function (9.17) and (9.18) is said 
to be poly-quadratically stable provided 

AVik,p,^{k,p))<0 (9.20) 

forallyt>0,0</7<a- 1. 

The requirement of (9.20) can be written as 

A{i{k,p)fW+A{^{k,p)) - W ^ (9.21) 

where W ^ diag{{Wi{^{k,p))), W2{^{k,p))} is defined by (9.18) 

W+ = dmg{Wim,p+l)),W2{m+hp))} 
= J2^^^s{Uk,P+ l)Wa,Uk+ hp)Wi2} 



J2Uk,p)diiig{Wn,Wa} 



(=1 



with X^Li ^'(^iP) = 1; ^K^jP) i^O;^i^ 0;0 — P — "^ ~ 1- W^ ^1^° require that 
^j{k,p+ 1) — ^i{k + l,p) — Ciikjp) andthematrixA of (9.6) in this case becomes 
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^^c^K^^pM,- (9.22) 



where A, are the polytope vertices (see 9.15). 
Remark 2 When 

dmg{Wiiak,P + 1)), 

W2m + i,p))} = dmg{wi{ak,p)), W2m,pm - w 

and hence £,i{k,p) = [,j{k,p), i = 1, . . ., v, poly-quadratic stability reduces to sta- 
bility along the pass (Theorem 1.1). 

The following result (drawing on the work in [2]) aims to minimize the con- 
servativeness present from the use of a sufficient, but not necessary, stability 
condition. 

Theorem 1.4 A discrete linear repetitive process described by (9.2) and (9.3) 
with uncertainty defined by (9.15) is poly-quadratically stable if there exists block- 
diagonal matrices Si>-0,i= l,...,v, i.e. S,- = diagj^jij^a}, and a matrix G 
such that 



yO 



G + G^ ~ Si G^A, 
AiG Sj 

for all i,j^ l,...,v. 

Proof Assume that (9.23) is feasible for all i,j — 1, . . ., v. Then 

G + G^ - 5, ^ 
and, since G is full rank and Sj >- 0, 

iSi-GfSr\Si-G)yO 
or, equivalently, 

G'^Sr^GhG'^ + G-Si 
Hence if (9.23) holds 



(9.23) 



TaT 



or equivalently. 



G^ 

Sj 



G'SJ'G G'A 
AiG 






^0 



G 

5,- 



^0 
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Next, introduce the substitutions W, = Sj ' and Wj = 5 ' to obtain 



Wi AjWj 
WjA, Wj 



yO 



for all i,j = 1, . . ., V. Further, for each /, multiply the corresponding inequalities for 
j = 1, . . ., V by Cj{k,p) and sum overy to obtain 






^0 



Also X^Li ^i{kTp)Wj = W^ and multiplying the resulting inequalities by ^j{k,p) 
for i = 1, . . ., V, and summing over / gives 



EUUk,p)Wi [Vi=,Uk,p)A^)W^ 



^0 



or, equivalently, 



W A^{^{k,p))W^ 

W+A{^{k,p)) W+ 



^0 



(9.24) 



Finally, applying Definition 1.3, followed by an obvious application of the Schur's 
complement formula, to (9.24) gives 



W - A{c{k,p)yw+A{£{k,p)) >-0 
which is equivalent to (9.21) and the proof is complete. 



(9.25) 

D 



Remark 3 Recall that the diagonal structure of Si in this last result arises directly 
from the stability theory for discrete linear repetitive processes. As noted previ- 
ously, this leads to only sufficient conditions and hence some conservativeness can 
be present which is, however, lower than if the matrices G, were also taken as 
block-diagonal (these matrices only relate to the corresponding LMI construction). 



With the control law (9.10) applied, (9.23) becomes 



G+G^~Si G^(Ai + BiK) 
(A,- + BiK)G 



Sj 



^0 



(9.26) 



where K is defined in (9.14), and the following result now gives a sufficient 
condition for the existence of a poly-quadratically stabilizing control law of the 
form (9.10). 
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Theorem 1.5 Suppose that a control law of the form (9.10) is applied to a discrete 
linear repetitive process described by (9.2) and (9.3) with uncertainty defined by 
(9.15). Then the resulting process is poly-quadratically stabilizable if there exist 
matrices Si y 0,i = 1, . . .,v, i.e. Si = diag{5/i,5;2}, G, and N (defined by (9.11)) 
such that 



G+G^ ~ Si 
AiG + BiN 



GM/ +N^B 



yO 



(9.27) 



for all ij = l,...,v. If this condition holds then stabilizing K\ and K2 in the 
control law are given by (9.10) with 



where K was defined in (9.14). 

Proof This result follows immediately from (9.26), on setting KG 



N. 



(9.28) 



D 



In the next section we consider parameter variable control laws in an attempt to 
reduce the level of conservativeness associated with the results so far. 



9.5 A Parameter Variable Control Law 



To enlarge the admissible uncertainty range for which stabilization is still possible 
(and hence reduce conservativeness), this section considers a parameter variable 
control law of the form 



Xk+i{p) 

yk{p) 



Uk+\{p) =K{^{k,p)) 

V 



= [Ki{ak,p)) K2{c{k,p))] 
Xk+i{p) 

ykip) 



Xk+i{p) 

yk(p) 



(9.29) 



over 0<p<a— 1,A:>0 where the designed matrix K{^{k,p)) contains uncer- 
tainty as defined in (9.15). In effect, this control law is composed of the weighted 
sum of current pass state feedback and feedforward of the previous pass profile and 
we have the following result (which can be interpreted as the generalization to the 
repetitive process case of a well known result [2]). 

Theorem 1.6 A discrete linear repetitive process described by (9.2) and (9.3) 
with uncertainty structure of the form (9.15) is poly-quadratically stable, if there 
exist matrices Si >- 0, i.e. Si — diagjS'/ij^'/a} >- 0, and Gi, i = 1, . . .v, such that 



Gi + Gj-Si GjA, 



AiGi 



Sj 



>-0 



(9.30) 



for all iJ — 1, . . ., V. 
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Proof Assume that (9.30) is feasible for all ij— 1, . . ., v. Then 

Gi + Gf - 5,- ^ 
and, since G,- is full rank and 5, >■ 



or, equivalently, 



Hence if (9.30) holds 



{Si-GfSi\Si~Gi)>Q 



GjS;'GihGj + Gi-Si 



GjSr^Gi GJA, 



AiG, 



or equivalently. 



G,-^ 
Sj 



Sj A I Sj 

Sj Ai Sj 



yo 



Gi 

Sj 



^0 



Next, introduce the substitutions Wj — S: ' and Wi — S: ' to obtain 

' J J 



Wi AjWj 
WjAi Wj 



for all ij — 1, . . ., V. Further, for each ;', multiply the corresponding inequalities 
fory = 1, . . ., V by Cj{k,p) and sum over j to obtain 






^0 



Also X]/=i Cj{k,p)Wi ~ W^ and multiplying the resulting inequalities by ^i{k,p) 
for i ~ 1, . . ., V, and summing over ; gives 



e:Li uk,p)Wi (e;'=i uk,p)A] 



w^ 



>-0 



or, equivalently. 



W A^{^{k,p))W^ 

W+A{^{k,p)) W+ 



^0 



(9.31) 



178 



B. Cichy et al. 



Applying Definition 1.3, followed by an obvious application of the Schur's 
complement formula, to (9.31) now gives 



which is equivalent to (9.21) and the proof is complete. 
Applying the control law (9.29) to (9.30) gives 



Gi + Gj - Si Gj{Ai + B,K,f 
{A, + B,K;)Gi Sj 



>~0 



(9.32) 

D 

(9.33) 



where the matrix Ki,i — 1, . . ., v, is given by 



Kn Ki2 



(9.34) 



The following result now gives a condition sufficient for the existence of a poly- 
quadratically stabilizing control law of the form (9.29) together with a design 
algorithm. 

Theorem 1.7 Suppose that a control law of the form (9.29) is applied to a discrete 
linear repetitive process described by (9.2) with uncertainty of the form (9.15). 
Then the resulting process is poly-quadratically stable if there exist block-diag- 
onal matrices Si >- 0, i.e. 5, = diag{5,i,5',2} >- 0,G,-, and 



Ni 



Nn Na 
Nil Ni2 



i = 1, . . ., V, such that 



Gi + Gj - Si GjAj + NjB] 



AiGi + BiNi 



Sj 



^0 



(9.35) 



(9.36) 



for all ij — 1, . • ., V. If these conditions hold, the vertex matrices Kn and Ki2 in the 
stabilizing control law of {9.29) are given by 



Ki - NiG;' 
where Ki is defined in (9.34). 
Proof This follows immediately from (9.33) on setting KiGi — N,. 



(9.37) 



D 



Theorem 1.7 and (9.29) provides the setting for the construction of the control 
law considered to stabilize the process with much wider uncertainty, or variability 
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domains, than alternatives. However, the final control law is still not available at 
this stage as only the vertex values of the required matrices are known. To complete 
the task, we need to know the exact parameters ^i{k,p), k>0,0<p<a — I, and 
i= 1, . . ., V, which can be recovered from the process dynamics by using the 
Matlab function "fmincon" (or any equivalent algorithm) to solve the following 
problem: 

Determine ^i{k,p) e R^, / = 1, . . ., v such that 

J2Uk,p)Vi^P{k,p) (9.38) 



where 



and 



^^=,(^,p) = l,^,(^:;')>0,^>0,0<p<a-l 



P{k,p) =. {A{k,p),B{k,p),Bo{k,p),C{k,p),D{k,p),Do{k,p)} (9.39) 

where V, denotes the convex domain vertices and P{k,p) the process state-space 
model matrix which is assumed to lie in the polytope constructed from them. (This 
construction will be explained in detail for the material rolling example considered 
in the next section). 

Given ii{k,p),k>0,0<p<a— l,i = 1, . . ., v, and the vertex matrices of the 
control law (9.29) from (9.37), we can complete the design using 

y 

K{ak,p)) = [Ki{ak,p)) ^2(c(^,p))]-5]^,(^,p)[^n Ka] (9.40) 

1=1 



9.6 Application to Material Rolling 

In this section we illustrate Theorem 1.7 by application to the material rolling 
model of Sect. 9.2 when the model parameters ii, /I2 are uncertain and the rest of 
parameters (i.e. T and M) are constant. Also we take T = 0.2 s,M = 100 kg, and 
assume that the model parameters li and I2 satisfy 

/ii e [/4, IT] = [0.216,0.984], Aze [l2,^] = [0.72,3.28] (9.41) 

Note first that a control law of the form of Theorem 1.5 can be computed in this 
case but requires that Ij and /I2 satisfy 
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M € [M,h] = [0.228,0.978], h e [h.h] = [0.76,3.26] 

i.e. this design comes at the price of a more restrictive range for the values of X\ 
and 12- 

The fact that this problem has two variables that can vary means that there are 
four uncertainty domain vertices. Also the uncertainty domain for the process 
matrices is easily verified as convex with vertices as follows 
vertex 1 



A = 



0.9377 187.5361 

-3.116- 10--* 0.9377 
0.9377 187.53611, 



B 



-0.0866 
4.3278 ■ 10-* 
D = -0.0866, 



So 



0.0144 
7.1907- 10-5 



A) = 0.7836 



vertex 2 



A = 



Q.1616 153.5191 

-1.162-10-^^ 0.7676 
C=f0.7676 153.51911, 



B- 
D 



-0.0709 
-3.5427 - 10-* 
0.0709, 



fio = 
£>o = 



0.0536 
2.6816- 10-* 
0.8229 



vertex 3 



A = 
C = 



0.8574 171.481 

-7.1297-10-* 0.8574 
0.8574 171.4811, 



B = 



-0.198 
-9.9024 ■ to- 
il =-0.198, 



So = 



0.0823 
4.1172- 10-* 



Do = 0.5049 



vertex 4 



0.925 185.0033 

-3.7492 - 10-* 0.925 
C= [0.925 185.00331, 



B 



-0.0229 
-1.143- 10-* 
D = -0.0229, 



30 '■ 



4.6328 - 10-3 
2.32- 10-5 



Do = 0.9428 



The parameters li, I2 vary with k and p and, since they are different on each pass, 
we denote them by }.i {k,p) and A2{k,p), respectively. Also they are assumed to lie 
in the fixed intervals given by (9.41) as shown in Fig. 9.2. 

Theorem 1.7 in this case can provide a variety of possible control laws but it is 
very difficult to select the one which will also satisfy the performance specifica- 
tions. To overcome this problem, we use the following corollary of Theorem 1.7. 

Corollary 1.8 Suppose that a control law of the form (9.29) is applied to a 
discrete linear repetitive process described by (9.2) and (9.3) with uncertainty of 
the form (9.15). Then the resulting process is poly-quadratically stable if there 
exist block-diagonal matrices Si, i.e. Si = diag{5/i, 5;2} >- 0, and matrices Gi and 
Ni,i= l,...,v, (where Ni is defined in (9.35)), such that the following convex 
optimization problem has a solution 
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/j»\ '*-.(k,p); number of passes: 25; number of points; 25 
range: passes [0, 24]; points: [0, 24] 




along the pass 



(b) 



A. (k,p}; number of passes: 25; number of points; 25 
range: passes [0, 24]; points: [0, 24] 




aiong the pass 



Fig. 9.2 a Values for A[(k,p); b values for l2(k,p) 



pass to pass 



maximize 

subject to 
Gi + Gj ~ 5, 
Aid + BiNi 



F — ^ trace (G/) 



GjAj + NjB 



TdT 



Sj 



>-0 



Also we require that the matrices Gt and Si are diagonal for all i — 1, . 
control law matrix vertices are obtained as in Theorem 1.7. 



(9.42) 



., V. The 



For the case considered here, this last result gives the following control law 
matrix vertices 



Ku = [3.4741 2166.7], 
K21 = [4.4876 2166.7], 
K^i = [2.0452 865.8534], 
K41 = [10.6 8092.6], 



Kn = 0.0873 
^22 = 0.1504 
K32 = 0.0818 
K42 = 0.215 



Now we are in a position to develop the procedure by which the variable control 
law matrices for a given k and point p are derived. Consider the matrix of (9.39) 
for given T,M, Ai{k,p) and X2{k,p), where A{k,p), . . .,Do{k,p) denote the system 
matrices A,...,D() computed for T = t,M ^ m, and variable parameters li — 
M{k,p) and I2 — ^^lik^p) from (9.1)-(9.2) at given k,p. Having obtained P{k,p) 
for k,p we can now recover from (9.38) (by using an algorithm based on the 
Matlab function "fmincon") the underling parameters i^i (A:,/?), A;>0,0<;?<a — 1, 
and (■ = 1 , . . . , V. Then, since the matrix vertices at {k, p) are known, it is a simple task 
to calculate the variable control law matrices at {k,p) using (9.40). Next, we go to 
p + I if p < a — 1 with k unchanged or '\fp = a — 1 we go to A: + 1 and set/? = 0, and 
so on. 

Without control action, the process here is unstable along the pass. Suppose 
also that the task is to reduce the thickness of the workpiece by one unit in the case 
when a — 25. Then, since the process dynamics are assumed to be linear, we can 
take the boundary conditions to be Xk+\{p) = 0, ^ > 0, and }'o(p) = 1, </? < 24. 
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(3) y^ ^{P)l number of passes; 25; number of points; 25 
range; passes [0, 24]; points: [0, 24] 



lb) tj^ \p}; number of passes; 25; number of points; 25 
range; passes [0, 24]; points; [0, 24] 





aiong ttie pass 



Fig. 9.3 a The process pass profile sequence for the controlled process; b the control input 
sequence 

Hence as k increases the sequence of pass profiles should approach zero. This is 
confirmed by the plot of Fig. 9.3a and there are two other major issues which need 
to be considered. These are the transient behavior in both p and k and the mag- 
nitude of the control input signal respectively. If the transient performance is not 
acceptable, the only option is to return and attempt to tune the design. For the 
second. Fig. 9.3b shows the control input sequence required to apply the control 
law in this case. If this is unacceptable, then further development is required and 
again this is left as a subject for further work — the result here shows that the 
control action is bounded and hence baseline acceptable. 

Finally, note that the control action energy required (maximum absolute value 
of the control signal) is much lower than that arising when Corollary 1.8 is not 
applied. Now, however, more passes must be completed before the control 
objectives are met. 



9.7 Conclusions 



This chapter has focused on the control of discrete linear repetitive processes in the 
presence of uncertainty in the state-space model used for control law design. A 
review of previous results in this area leads to the conclusion that the design of 
practically relevant control laws is possible, but the uncertainty model which has 
been used to obtain such results may be restrictive in the sense that there is little 
margin for tuning the control law matrices to obtain stability plus desired perfor- 
mance. The main objective in the work reported in this chapter is to develop control 
laws which vary in both the pass number and along the pass variables. This has been 
achieved by allowing the control law matrices to vary with both the pass number and 
the along the pass variable. The result is an algorithm for basic selection of the 
control law matrices without destroying the static nature of the control law. 
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Chapter 10 

Unique Full-Rank Solution 

of the Sylvester-Observer Equation 

and Its Application to State Estimation 

in Control Design 

Karabi Datta and Mohan Thapa 



Abstract Needs to be found, is a classical equation. There has been much study, 
both from theoretical and computational view points, on this equation. The results 
of existence and uniqueness are well-known and numerically effective algorithms 
have been developed in recent years (see, Datta [2]), to compute the solution. 



10.1 Introduction 

The Sylvester matrix equation 



AX~XF = R (10.1) 



where A, F and R are given and X needs to be found, is a classical equation. There 
has been much study, both from theoretical and computational view points, on this 
equation. The results of existence and uniqueness are well-known and numerically 
effective algorithms have been developed in recent years (see, Datta [2]), to 
compute the solution X. A variation of this equation called the Sylvester-observer 
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equation (Luenberger [3], Datta [2]) arises in the design of Luenberger observer in 
Control Theory. In this variation, the matrix A and partial informations of F and R 
are known, the matrix X and the unknown parts of F and R are to be found. The 
matrix F is required to have a pre-assigned spectrum and the matrix X must be a 
full-rank matrix. 

Several computationally viable algorithms have been developed in recent 
years for the solution of the Sylvester-observer equation. These includes (i) the 
observer-Hessenberg method of Van Dooren [4], (ii) generalization of the Van 
Dooran method to the block case by Carvalho and Datta [5], (iii) a new block 
algorithm by Carvalho et al. [6], (iv) the SVD-based algorithm by Datta and 
Sarkissian [7], (v) the Arnoldi-based method for large-scaled solution by Datta 
and Saad [8], Calvetti et al. [9] and a parallel and high performance algorithm by 
Bischof et al. [10]. 

All the above methods implicitly construct a full-rank solution assuming that 
such a solution exists. But, no systematic study has been done yet. The purpose of 
this paper is to study the existence and uniqueness of a full-rank solution to the 
Sylvester-observer equation, when the spectrum F is prescribed in advance. The 
results are new and applicable in a computational setting to determine if such a 
solution exists, when the matrix A and partial information on F and R are given. 

10.2 Full-Rank Solution of the Sylvester Equation 

Consider the Sylvester equation 

AX-XF = R, (10.2) 

where A,F,R are respectively, of order n x n, s x s, and n x s. Let X be a unique 
solution of Eq. (10.2); that is, Q(A) n Q{F) — (p. The following result on the 
existence of a full-rank solution of (10.2) was proved by de Souza and Bhatta- 
charyya [11]. 

Theorem 1 [11] Necessary conditions for the unique solution X of (10.2) to be of 
full-rank are that (A, 7?) is controllable and {R,F) is observable. The condition is 
also sufficient if R has rank 1. 

A necessary and sufficient condition in the general case was obtained in Datta 
et al. [12]. 

Theorem 2 [12] Let A,F and R be, receptively, the n x n, s x s and nx s 
matrices. Then 

AX~XF = R (10.3) 

has unique a full-rank solution if and only if the matrix 
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Sk{R) = [RAR- 



■ a'^r] 



n.(k+\)s 



ail 
I 



ak-il 



\0 ■■■ 



ail 
I 



{k+l)s,s 
{k+l)s,{k+l)s 



has full- rank s, where Q<k'^\<s is the degree of the minimal or the charac- 
teristic polynomial of F, and afs are the coefficients of the characteristic (mini- 
mal) polynomial of F. 

Forming the matrix Sk{R) and checking its rank numerically is a computa- 
tionally prohibitive task. Below we now show how these computations can be 
more effective and practical by taking advantage of results from control theory and 
certain results exploiting the structures of the associated matrices. 

it is well known (see, Datta [2]) that a controllable pair (A,/?) can be trans- 
formed to a controller-Hessenberg form (//, R) by an orthogonal similarity, where 
// is a block upper-Hessenberg matrix and R has the special structure, as given 
below. We will assume that {A,R) has been given in that form; that is, A = H, and 
R = R: thus 



H = 



R^R 



/ Hu Hi2 
till H22 



V 

/(^l)«,x«, 

(0)„,x„, 



V (0)„.x„, 



Hip 



Hpp-i Hpp I ^^^ 



•• (0)„,x„,.\ 
■■ (0)„,x„ 



\^)n„-KnJ „x.5 



where r < p, s<n, («] + ni + • • • + «<■) = -5, (ni + n2 + • • • + «p) = «, and (//y) 
is «; X Uj where i~\: rip-, j = I : np. Let rank (R) — tii. Also, without any 
loss of generality, let us assume that the matrix F has the block lower Hessenberg 
form: 



F 



/(^Il)n,x«, 
(^2l)„,x«, 



I 22Jn2XM2 





Fii 



Fr-\.r-l 



(0)„,x„,. \ 





Frr-1 



It is easy to see that the ijth powers of H and F are given by {q <p) 
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/ h\ 



H" = (4) = 



H. 
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(?) 
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where h'^^I^^ = (77^+,,^ • //^.^_i ■ • ■//32//2i)„,,^,„_ , and 



F"? ^ ( /?!.'?' 



(?) pii) f(?) 



/fW) f' 



12 ^ 13 



f(?) F' 
■^22 ^^23 



(?) 



V 



?(?) 

1?+1 



5.(?) 

2?+2 



\ 




^ r—q^r 



pii) I 



where F^^+j = (Fi2/='23 • • •^??+i)„„„,^,- 

In the next theorem, we show that the full-rankness of the unique solution X of 
the Sylvester equation 



HX-XF^R 



can be checked only by knowing powers of certain block matrices of H and F. 
Theorem 3 Let 



HX~XF = R, 



(10.4) 



where (H, F) and R are as given above. Assume that (H, R) is controllable, (R, F) 
is observable, R has full rank n\ , and H and F do not have a common eigenvalue. 
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Then Eq. (10.4) has a unique full- rank solution if and only if 



H2xRxF\7' 



rank 



Hiy^RiFn Hl^i'^RiiFn+aJ) 





V 







^f+1,1^1 



/ n,.s 



(10.5) 



where (^ + 1) is the degree of the minimal or the characteristic polynomial of F 
and ai's are the coefficient of the characteristic (minimal) polynomial of F. 

Proof Define H = {R, HR, ■■■ H''R) and 



^I aj ■■■ akl\^ / F'' \ 

pk-\ 



F = 







fli/ 



Vo •■• I J \ I J 



Theorem 3 will then follow from Theorem 2 if we can show that rank 
{H-F)= s. 

By direct computations, we have 



H 



f Ri ■•• HnRi ••• H\^Ri ••■ 0\ 
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where 



F = 



(I 


Vo 



a\I 02! ■ ■ ■ a^I \ 

/ ail ■ ■ ■ ak-\I 



Fi 



/ A,+n. 



(10.6) 
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By direct multiplication of the two matrices F and H and deleting zero rows and 
zero columns, we can write 
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rank(// ■ F) = rank 
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"1(<!+1),S 

The 2nd matrix of Eq. (10.7) after column permutations can be written as 













7(^-1) 



/■=0 

/t-1 
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(10.8) 
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Since rank of a matrix is not altered by multiplication with a permutation matrix, 
from above, we then have 
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iank{H,F) = rank 
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The result of Theorem 3 now follows from (10.9). 
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h[\ '^Ri{Fn+aiI) 



+HIaRi 



f^k+i.i^i 



/ 

(10.9) 



As stated in Introduction that a variation of the Sylvester equation (10.1) known as 
the Sylvester-observer equation, arises in control theory in the context of con- 
structing Lenenbuger observer. The Sylvester-observer equation has the form: 

XA-FX = GC, 

where the matrix C is the output matrix of the linear control system: 

x(t) ^Ax{t) +Bu{t) 
y = Cx{t) 

and given in advance, and the matrix X^F and G need to be computed under 
certain assumption. In this section we consider the dual of this equation; namely 



AX-XF = CG 
If G is chosen to be G = [/ • • • 0] then Eq. (10.10) becomes 
AX-XF = R= (C,0,...,0). 



(10.10) 



Since a necessary condition for the solution X to have full-rank is that (A, R) is 
controllable, we can assume that (A,/?) has been transformed into a controller- 
Hessenberg form (//, R) where H = QAQ^ and R — QR have the forms as before. 
We therefore, concentrate on finding a full-rank solution of 
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where X = QX. 










If F is chosen as a block bidiagonal as 


given 


below 
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(10.11) 



where F,,,+i ,i= I, . . .,k has full rank, and /: + 1 is the degree of the minimal 
(characteristic) polynomial of F such that 

rankf//!^' ,^1) + rankf7/|V''/?i^i2 



rank(//2i/?i/^{^"''' 



rank /?iF 



7W 



M^U+I 



Then it follows from Theorem 3 that 
Theorem 4 The Sylvester Observer equation (10.11) 

HX-XF = R 

has full-rank solution X if and only if 

Tank{H[%^Ri) + mi±{Hf;''>RiFi2) 

+ rank(//2i/?i^t,-' ) + rank(7;iFj','+i ) 



(10.12) 



Derivation of the Result by de Souza and Bhattacharyya. If A: = p — 1, i.e. 
rank(^) = 1 , then the necessary and sufficient condition of de Souza and Bhat- 
tacharyya [11], where R has rank 1 of (10.11) follows immediately as a special 
case of Theorem 4. 

Corollary 1 [11] Let (H^R) be a controllable pair and rank{R) — 1. Then 

HX-XF=\q : : (10.13) 

Vo 0/ 



has a unique full-rank solution if and only if rank [H] = s. 
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Since {H, R) is controllable, there exists a k such that rank(//) = i'. Thus by 
theorem 3, Eq. (10.13) has a full-rank solution if and only if {H, R) is controllable. 

D 

Corollary 2 If n\ and k are such that ni{k -\- \) = s and rank{H) = 5, then Eq. 
(10.11) has a unique full- rank solution if and only if F is nonsingular. 

Proof Proof follows from Eq. (10.9). 

Now observe that rank H^^^ ^Ri = rank(//t+ijt ■ i/t,t_i . . .//32//21 ■ ^1) = 
min(«j:+i,ni). 

If «*:+! > «i for A; = 1, . . .,/7 — 1, then we have the following result. D 

Corollary 3 If ni(^i>ni — s,k ^ 1,2, .. .,p — I, then there always exists a 
unique full-rank solution to the following equation 



HX-XF 



* 



Vo/ 



;io.i5) 



Proof From Theorem 3 we know that X is a unique full-rank solution if and only if 
rank(//, F) = s. 
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rank(//, F) = rank 
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(10.16) 



where ni + M2 + • • • + «p = « and ao — I, l<fe+lis the degree of the charac- 
teristic or minimal polynomial of F, and rank (H) >s — rii, rank (//}^i ^ ■ Ri) = 
rank {Hk+iM ■ H^.k-i ■ ■ ■ H32 ■ H21 ■ Ri) == min(ni+i,Mi). 

From (10.16) it is clear that if «i+i > n^, then there always exists a unique full- 
rank solution. n 



10.3.1 Derivation of the Van Door en Result 
on Sylvester-Observer Equation 

Van Dooren [4] proposed an algorithm for constructing a full-rank solution of a 
Sylvester-observer equation using observer-Hessenberg form. 

Using our result of Corollary 3 we will now explain why his algorithm always 
yielded a full-rank solution. 

Van Dooren considered the Sylvester-observer equation in the form: 



XA~FX^GC 



;io.i7) 
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Using the observer-Hessenberg form of {A,C), the Eq. (10.17) reduces to 

XH ~FX^ G(0, . . .,R{) = (0, . . .,/?i), (10.18) 

where/?! is an upper triangular matrix with full-rank n^. , G = (/)^^„ ,//is in block 
upper Hessenberg form and /^ is a block lower triangular form as described below: 
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and 



In order to have a full-rank solution X, Van Dooren's assumption was rank 
(Ri) = rank(//i t_i) — rik and n,- < «,■_! for i = 2, . . .,k. To put Eq. (10.18) in our 
setting we first take the transpose of (10.18) 
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and then multiply both sides by a suitable permutation matrix to obtain the 
following: 
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According to the result of Corollary 3, this Eq. (10.20) has a full-rank solution if 
and only if m^ < «i • Thus in particular if n^ = «i , Eq. (10. 17) always has a full-rank 
solution. 



10.4 Algorithm and Numerical Examples 

Based on our above discussions, we now give an algorithm for choosing "k" and 
the diagonal blocks of F so that the Sylvester observer equation 

AX~XF^{CO,...,0) (10.21) 

has a full-rank solution. 

Algorithm 1 Constructing a full-rank solution X and F 

Input A G R"*"", C £ R*"'"'', s ^ size of F 

Output A full-rank solution X E M"^' and an upper block bi-diagonal matrix Fj j. 
Assumptions (A, C) is controllable, rank (C) = «i. 

Step 1 Transform (A, C) to the controller-Hessenberg form {H,R), 
where (i) the number of subdiagonal blocks of H — p 
(ii) Size of diagonal block i/, = n, for / = I, . . .,{p + I), 



(iii) R 











Step 2 Form^= [^...0](„^) 
Step 3 Compute k : 

set s = rank(/?i) = «i 

if S = .J then A: = 0, stop 

for j ~ I to p 

Sj = min(n;+i,ni) 

J = i + sj 

If S > s, then k ~ j, stop 

end 
Step 4 Construct the upper bidiagonal matrix F as described in Sect. 10.3 having 

(A: + 1 ) diagonal blocks such that the degree of the minimal (character- 
istic) polynomial of F is A: + 1 . 
Step 5 Solve HY — YF ^ R using a standard Sylvester equation solver such as 

MATLAB function LYAP, based on the well-known Hessenberg-Schur 

algorithm of Golub et al. [13]. 
Step 6 Construct X — QY, where Q is the orthogonal matrix used in Step 1 to 

transform (A, C) to {H,R) 
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10.4.1 An Illustrative Numerical Example 

We take illustrate our algorithm by taking a random matrix of order 20 x 20 and C 
a random matrix of order 20 x 3 , i = 9 



Step 1 



H = 



Hu 


Hn 


7/21 


Hii 








Hv 



Hill Hi 



88 



where block size of (///+i.,) — (3,3),/= 1, 6, block size H{%,1) 
(2, 3) and the number of subdiagonal blocks of H = p = 1 . 



Step 2 R = 
where 



/?i 




20x9 



/?! 




-0.2114 

-2.9505 





-0.0485 \ 
-2.4689 
-2.0061 / 



Step 3 It is easy to see when k — 2,Sj — 9 ^ s. 

Step 4 Choose F with degree of minimal polynomial = 3 



2/ 
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0' 





3/ 


/ 








4/ 



9x9 



where 1 — 2, 3, 4 are not the eigenvalues of H. 
Step 5 Solve for X : HY - YF = R. Verify rank(y) = 9. 



10.4.2 Results on Numerical Experiment 

We now present results on our numerical experiment on a Benchmark Example 
taken from Higham [14]. Here A is a pentadiagonal Toeplitz matrix, C is randomly 
chosen and the eigenvalues of F are chosen such that they are disjoint from those 
of A. Define Residual ^ \\HY - YF - R\ \^. 

Example 1 Pentadiagonal Toeplitz matrix in = 100) 



Rank(R) 



n — n\ 



Deg of minimal poly of F 



Rank(Y) 



Residual 



13 


87 


7 


14 


86 


7 


15 


85 


6 



87 


2.466913e-014 


86 


1.740644e-014 


85 


3.041 122e-014 
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Rank(R) 


s = n — rii 


Deg of minimal poly of F 


Rank(Y) 


Residual 


38 
39 
40 


262 
261 
260 


7 
7 
7 


262 
261 
260 


2.959402e-014 
2.967265e-014 
7.285381e-014 


Example 3 


Pentadiagonal 


Toeplitz matrix (« = 500) 






Rank(R) 


s = n — rii 


Deg of minimal poly of F 


Rank(Y) 


Residual 


63 
64 
65 


437 
436 
435 


7 
7 
8 


437 
436 
435 


7.452755e-014 
4.704602e-014 
7.495685e-014 



10.5 Conclusion 

A necessary condition for the existence of a full-rank solution to a Sylvester 
equation, and a necessary and sufficient condition when the right hand side matrix 
is a rank one matrix, were known for long. However, the characterization of 
full-rank solution in case the right hand matrix has a arbitrary rank, was an open 
problem for a long time. In 1997 it was settled by Datta, Hong and Lee [15]. 
Unfortunately, that condition turned out to be of mostly theoretical interest and is 
not readily applicable in a practical computational setting. 

On the other hand, a variation of the Sylvester equation, called the Sylvester- 
observer equation, AX — XF = CG, arises in practical applications in the context 
of designing Luenberger observer in control theory, where it is crucial that the 
matrix X has full-rank. 

A criterion for choosing the matrix F, based on the controller-Hessenberg 
reduction of {A,C), which will guarantee that the solution matrix X has full-rank, 
is presented in this paper and an associated algorithm is described. 

It is hoped that these results will be of some practical value to the control 
theorists and engineers. 
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Chapter 11 

On Symmetric and Skew-Symmetric 

Solutions to a Procrustes Problem 

Yuan-Bei Deng and Daniel Boley 



Abstract Using the projection tiieorem in a Hilbert space, tiie quotient singular 
value decomposition (QSVD) and tiie canonical correlation decomposition (CCD) 
in matrix theory for efficient tools, we obtained the explicit analytical expressions 
of the optimal approximation solutions for the symmetric and skew-symmetric 
least-squares problems of the linear matrix equation AXB = C. This can lead to 
new algorithms to solve such problems. 



11.1 Introduction 

Certain least-squares problems of linear matrix equations are called Procrustes 
problems [1, 13]. The unconstrained and constrained least squares problems have 
been of interest for many applications, including particle physics and geology, 
inverse Sturm-Liouville problem [11], inverse problems of vibration theory [6], 
control theory, digital image and signal processing, photogrammetry, finite ele- 
ments, and multidimensional approximation [8]. Penrose [2, 21] first considered 
the linear matrix equation 



AX = B (11.1) 
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and obtained its general solution and least-squares solution by making use of the 
Moore-Penrose generalized inverse, then Sun [22] obtained the least-squares 
solution and the related optimal approximation solution of Eq. 11.1 when Z is a 
real matrix. The least-squares problems of Eq. 11.1 were discussed in 1988 by 
Higham [13] and Sun [23] when the solution matrix X is constrained to be a real 
symmetric matrix, and Sun also discussed the related symmetric optimal 
approximation problem of Eq. 11.1 in [23]. This work was further extended by 
Andersson and Elfving [1]. For more information about the linear matrix equa- 
tions, see e.g. [12, 15, 16, 18]. 

In this paper, the least-squares problems 

minllAXB-ClL, (11.2) 

xeQ 

with Q symmetric or skew-symmetric cone or possibly the real matrix space, and the 
related optimal approximation problems are considered. As it can be seen in the 
following discussions, we can make sure that the least-squares problem (11.2) 
always has a solution over the symmetric or skew-symmetric cone. Obviously if 
the equation AXB = C is consistent, then the least-squares problem (1 1.2) and the 
equation AXB — C have the same solution set. The general expressions for the 
symmetric solution of the equation AZB ~ C were obtained by using the generalized 
singular value decomposition of matrices (GSVD) by Chu [4] in 1989 and Hua [5] in 
1990 respectively. Fausett and Fulton [8] and Zha [25] considered the least-squares 
problems (11.2) in the real matrix space, while the symmetric and the skew-sym- 
metric least-squares solutions of (11.2) have been derived by Deng et al. [7]. 
Liao et al. [17] used the projection method first for finding the best approximate 
solutions for the matrix equation AXB + CYD = E. But it remains unsolved about 
the optimal approximation solutions for the symmetric and skew-symmetric Pro- 
crustes problems of this equation. Therefore in the following, we will consider the 
optimal approximation solutions of the constrained least squares problems related to 
(11.2), we obtain the explicit analytical expressions of the optimal approximation 
solutions for the symmetric and skew -symmetric least-squares problems of the form 
(11.2). Of course, when the least-squares problem (11.2) has a unique solution, then 
there is no need to discuss the related optimal approximation problems. 

In this paper we always suppose that R'"^" is the set of all m x m real matrices, 
SR"'^", AR"^" and OR"^" are the sets of all symmetric, skew-symmetric and 
orthogonal matrices in R"^'\ respectively, A * B represents the Hadamard product 
of A and B (cf. [14]), and \\Y\\p denotes the Frobenius norm of a real matrix Y, 
defined as 

\\Y\\l^{Y,Y)^Y.y^j, 
'J 

here the inner product is given by {A,B) = trace(A^Z?), and R'"^" becomes a 
Hilbert space with the inner product. 
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The specific problems treated in this paper are stated as follows. 
Problem I Given matrices A e R"""'", B e /?«^'\ C e /?""='' and Xf e R""", let 

Se = {X\X e SR"""", \\AXB - C\\p = min}. (11.3) 

Then find X^ G Se, such that 

\\X,~Xf\\, = min\\X~Xf\\,. (11.4) 

Problem II Given matrices A e R"""", B e R"''p, C G /?""='' and Xf G /?"'"', let 

Sa = {Z|X G A/?"^", \\AXB - C\\p ^ min}. (11.5) 

Then find Xg G 5^ , such that 



\\Xa-Xf\\,^^n\\X-Xf\\,. 



(11.6) 



We first introduce some results about the quotient singular value decomposition 
(QSVD) and the canonical correlation decomposition (CCD) of a pair of matrices, 
and the projection theorem in a Hilbert space, which are essential tools for the 
problems to be discussed, and then give a mechanical model as the practical 
background of the above mentioned problems. 

QSVD Theorem [3] Let A G R"""", B G R"""''. Then there exist orthogonal 
matrices U G OR'"^'", V G OR''^'' and a nonsingular matrix Y G R"^" such that 



where 



with 



A = U'LiY-\ B'^ = V'L2Y-\ 



/,' 0\ r' 

5 0.' 

0/ ,«-/-.' 

/■' s' t' n—k'. 




k' = rank(A^, B), s' = rank(A) + rank(B) ~ k', "» 
r' = k! ~ rank(5), S — diag((Ti, . . ., (Tj/), 
t' = k' - rank(A), ct; > (/ = 1, . . .,s'). 



(11.7) 



(11.8) 



(11.9) 



(11.10) 
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When A and B^ are of full column rank, i.e. r(B) — r(A) — n, then r' — Q,s' — n, 
k' = n, and 



S, = 



£2 = 



(11.11) 



The QSVD as given above differs from the GSVD of (A, 5^) only in the way the 
second block column of (11.8) and (11.9) are scaled. The diagonal matrices S in 
(11.8) and identity matrix /v in (1 1 .9) are scaled in the GSVD to be diagonal matrices 
S and C such that S^ + C^ — I, and the Y is adjusted accordingly. This particular 
scaling simplifies some of the later formulas in this paper. See details in [9, 20]. 

The canonical correlations decomposition of the matrix pair (A^, B) is given by 
the following theorem. 

CCD Theorem [10] Let A e R""'",B e R'""'', and assume that g = rank(A), 
/z = rank(Z?), g>h. Then there exists an orthogonal matrix Q d OR"^" and 
nonsingular matrices X^ G R"'^"'^ Xg S RP^p such that 



e[S^,0]X^-\ B=GpB,0]Z^', 



(11.12) 



where S^ G R"'^'^ and S^ G R"^ are of the forms 



Sa 



//, o\ ,■ 






A, 





j 




















1 
n-g-j-t ' 


Xb- 




h 
n-h 





A, 





j 




h 




^0 





i.) ' 








/ 


j 


/ 











(11.13) 



with i +j + t — g, 'La having partitioned row dimensions ij, t,n — g — j — tj, t, 
and Y,B having partitioned row dimensions h,n — h, and 



Here, 



Aj = diag(A; 
Aj = diag((5; 



+I1 



+u 



'-i+i 



.5 



/+! 



.,/l,+y), 1 >i,-+l> •■• >l,+j >0, 
.,(5,+y), 0<<5,+ i< •■• <<5,+^<l, 



■'^i+j 



i.e., A] 



i = rank(A) + rank(B) - rank[A^, B] , 

j = rank[A^, B] + rank(AB) - rank(A) - rank(B), 

t = rank(A) - rank(Afi). 
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Notice that the QSVD and the CCD are different. In QSVD, the matrix pair A 
and B^ have the same column dimensions, the right nonsingular matrices are the 
same Y in (11.7), the left orthogonal matrices are U G OR'"""" and V G ORP'^P; 
while in CCD, the matrix pair A^ and B have same row dimensions, the right 
nonsingular matrices are X^' and X^^ in (11.12), the left orthogonal matrices are 
the same Q. 

The projection theorem in a Hilbert space is also important for the problems to 
be solved. 

Lemma 1 [19, Theorem 5.14.4, p. 286]. Let Jibe a Hilbert space, and let Ai be a 
closed linear subspace ofTi. Let xq Cz Ti. and define 

<5 = inf{||xo-3'|| -.yeM}. 

Then there is one {and only one) yo & Ai such that 

11-^0 ->'o|| = <5- 

Moreover, xq — yQ JlM., that is {xq — yo,y) = for all y ^ A4. Furthermore, yo is 
the only point in Ai such that xq — yo LA4. 



11.2 The Solution of Problem I 

In this section, the explicit expression for the solution of Problem I is derived. Our 
approach is based on the projection theorem in a Hilbert space, and also based on 
QSVD and CCD of matrices. Specifically, it can be essentially divided into three 
parts. First, we characterize the symmetric solutions Xq of the least-squares 
problem (11.2) by using the QSVD; then by utilizing the general form of the 
solution Xq and the projection theorem, we transform the least-squares problem 
into a equation problem; and finally, we find the optimal approximate solution of 
the matrix equation by making use of CCD. 

Without loss of generality, we suppose that rank(A) >rank(B). Instead of 
considering the solution of Problem I directly, we will find a matrix Co, and then 
transform Problem I into the following equivalent problem. 

Problem lo Given matrices A e /?""'", B G R""'', Co G /?""='' and Xf G R""'" , let 
Se, = {X\X G SR!""\AXB = Co}. (11-14) 

Then find Xf, G Se^ , such that 

\\X,-Xf\\,^mm\\X-Xf\\,. (11.15) 

We use the projection theorem on R'^^p to prove the two problems are 
equivalent in the following. 
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Theorem 1 Given A e ^"'^",5 G R'^'p , C e R"'''', let Xq be any solution of {11.2) 
with Q the symmetric cone, and define 

Co^AXoB, (11.16) 

then the matrix equation 

AXB = Co, (11-17) 

is consistent in SR"^", and the symmetric solution set Se„ of the matrix equation 
(11.17) is the same as the symmetric solution set Se of the least-squares problem 
(11.2). 

Proof Let 

C^{Z\Z = AXB,X(^SR"'"'}. (11.18) 

Then C is obviously a linear subspace of R'"'^''. Because Xq is a symmetric solution 
of the least-squares problem (11.2), from (11.16) we see that Cq ^ C and 

\\Q, ~ C\\p = WAX^B ^ C\\p 

= min \\AXB-C\\p 

= min IIZ — CIL. 
zee " '"^ 

Then by Lemma 1 we have 

(Co-C)±£ or (Co-C)e/:-L. 
Next for all X e 57?"^", AXB - C,, G C. It then follows that 

\\AXB -C\\l^ II (AXB - Co) + (Co - C)||^ 
= ||AX5-Co||^+||Co-C||^. 

Hence, Se — Se^, and the conclusion of the theorem is true. D 

Now suppose A e R""'",B e R'"''' and the matrix pair (A,B^) has the QSVD 
(11.7), and partition U^CV into the following block matrix: 

(Cii Ci2 Ci3 \ /•' 
Cn C22 C23 ..' , (11.19) 

C31 C32 C33 / m-r'-s' 



where I' = p + r" ~ k', with r', A:' defined as in (11.10). Then the expression of Co 
will be shown in the following theorem. 

Theorem 2 Let A,B, C be given in Problem L the matrix pair {A,B^) have the 
QSVD (11.7), and U^CV be partitioned by (11.19), then for any symmetric 
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solution Xq of the least-squares problem (11.2), the matrix Cq determined by 
(11.16) can be expressed by the following form: 

/O Cn Cn\ r' 
Co^UaV^, C*= SX22 C23 .>' , (11.20) 

\0 / m-r'-s' 

t s' t' 

where 

X22 = ^*{Cl2S + SC22), 

C^^{cp,,)eSR'''', cp,i^^^,l<k,l<s'. (11.21) 

Proof From Theorem 2. 1 in [7] we know that the symmetric solution of the least- 
squares problem (11.2) can be obtained by using of the QSVD of matrix pair 
(A, B^) and the general form of the solution is 



Xn 



X'n 


Cn 


Ci3 


^'14 1 




C\2 


X22 


5-'C23 


^4 


Y^ 




{s-'C2,Y 


X^3 


^34 




^24 


A34 


^44. 





(11.22) 



where Z22 is given by (11.21) and X[^ e SR'' ""^ , X'^^ (^ SR'' ""' , X\^ & 
gf^(n-k')y\n-k') ^ X',4 G ^"'^ <""*'' , ^^4 S ^"' '*("-*") , ^^4 G Z?''^*""*'' are arbitrary matrix 
blocks. 

Substituting (11.7), (11.22) into (11.16), we can easily obtain (11.20). D 

Evidently, (11.20) shows that the matrix Co in Theorem 2.2 is dependent only 
on the matrices A, B and C, but is independent of the symmetric solution X of the 
least-squares problem (11.2). Since Co is known, from Theorem 2.1 we know that 
Problem I is equivalent to Problem Iq. In Problem lo, since Se^ 7^ 0, we can derive 
the general expression of the elements of Se^ in the following theorem. In this 
theorem, A G j^mxn ^ g ^ j^nxp ^^ given, while Co is expressed by (11.20), and 
assume that g — rank(A), h = rank(B), the matrix pair (A^,5) has CCD (11.12). 

We discuss Problem I in two cases. 

Case I g = h. Suppose X G Se„, then partition the symmetric matrix X* = Q^XQ 
into the block matrix, 

^* = (^h)6x6- (11-23) 



with the row dimensions (and the related column dimensions) of blocks are 
X^CoXg and also partition E into block matrix. 



i,j,t,n — g —j — t,j,t respectively, and Xki = Xf,^,k, I — 1,2, .. .,6. Let E = 



E={E„),,„ (11.24) 
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with the row dimensions of blocks are ij, t,m — g and the column dimensions of 
blocks are iJ, t,p — g respectively. 

Theorem 3 In Problem Iq, the general form of the elements of Se^ can be 
expressed as 







(En 


£12 


£13 


X[4 


^51 


ElA 






-^12 


X22 


^23 


A24 


^52 


E12 


= QX*Q^ -- 


= Q 


Eh 


xh 


X33 


X24 


A53 


El, 






XJ4 


^4 


^4 


X44 


X45 


-^^46 






y(0) 


y(0) 
^52 


y(0) 
^53 


Xls 


X55 


^56 






\£31 


£32 


£33 


xl. 


xk 


XiJ 



Q' 



(11.25) 



where X^k — Xli^,2<k <6, Z14, X23, X24, X24, Xt5, X46 and Xse are arbitrary 



matrices with the associated sizes, and X, 
Ar'{E22 - AjX22),xf^ = Ar'(£23 - Ay^23). 



(0) 
51 



(0) 



A7\E2i-\jEl,),X,^ 



Proof Suppose X G Seo, then 



AXB = Q 



0- 



Substitute (11.12) into (11.26) to obtain 



^J)X*(E«,0)=£, 



then substitute (11.13), (11.23) and (11.24) into (11.27) to obtain 

/ ^11 Xn Xis 0\ 

AjX2i+AjXsi AjX22 + AjXi2 AjX22+AjXs3 

Xei X(,2 ^63 

0/ 



(11.26) 



(11.27) 



V 



{Ekl)4 



;ii.28) 



Because the matrix equation (1 1.26) is consistent, therefore we can obtain some Xjj 
from (11.28) directly. Comparing with both sides of (11.28), the expression 



(11.25) can be derived according to the symmetric property of X*. 

The following lemmas are needed for the main results. 
Lemma 2 [24]. For given Ji, J2, J3 and J4 e R"""", 

S„ = diag(ai , . . ., a„) > 0, Si, = diag(/7i , . . ., b„) > 0, 
Sc = diag(ci, . . .,c„,) > 0, S^ = diag(rfi, . . .,d,„) > 0, 



D 



there exists a unique W G R'"^", such that 

\\SaW - /, llf + \\S,,W - J2\\l + \\ScW - /3 llf + \\SdW - J4\\l = 

and W can be expressed as {where * denotes the Hadamard product) 
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W = P* {SaJl + SbJ2 + SJ3 + SdJA). 

where 

P = {Pki) e /?'""", pki = 1/(4 + bj + cl+4), l<k<m, l<l<n. 

Lemma 3 For given Ji,J2 and Jt, ^ R^^\ Sa ~ diag{ai, . . .,0^) > 0, St = 
dia.g(b\, . . .^bs) > 0, Sc = diag(ci, . . .,0^) > 0, there exists a unique symmetric 
matrix W G SR"^'^, such that 

fi = \\SaW -Ji\\l + \\SbW - J.ff + \\S,W - 73 11^ = min, 
and W can be expressed as 

W = xIj* {SJi + jJSa + SbJi + JlSh + 5,73 + JlS,), ( 1 1 .29) 

where 

•A - (</<«) e ^■'"', <i>ki = i/i4 + 4 + bl + bj + cl + cj),i<k,i<s. 

Proof For W G 5'/?^^^ the property w^i = w//;(l <k^l<s) holds, and 

fi = ^[{akWkk - J\kk) + (bkWkk - Jikk) + {ckWkk - J-ikk) ] 

k=\ 

+ X! [i'^kWki - J\ki) + {aiWki - J\ikY + [bkWki - Jiki) 

I <k<l<s 

+ {biWki - Jiik) + [ckWki - Jiki) + {ciWki - Jm) ]■ 
The function ^i is a continuous and differentiable quadratic convex function 



of ^s{s + 1) variables Wki, hence fi obtains its minimum value at {wki} when 



-^ - 0, i.e., 



akJiki + aiJiik + bkJiki + biJoik + CkJiu + c/Zstt , , , , , , 
"^ki = 9— — 7 , ,, , ,, — o— — 9 , \<k<l<s. 



aj+aj+bl + bj + cl 



c 



Therefore W can be expressed by (11.29). D 

Now we state the main theorem, here we still suppose that rank(A) — rank(Z?). 

Theorem 4 Let matrices A,B,C and Xf be given in Problem I, suppose 
rank(A) = rank(Z?), partition the matrix Q^XfQ into block matrix 

Q'XfQ=iX^i\^„ (11.30) 

with the same row and column dimensions as X* o/(l 1.23). Then there is a unique 
solution Xg in Problem I and X,, can be expressed as 
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Xe = Q 



I En 



^12 



{<} 



^22 
^23 



24 



Xli {^24 } 



y(0)T 

^52 

y(0)7- 



^^32 

C33 



33 / 1^34 

{x,7} {xi^,^} {4'} {xf,'} {X^^} {X^>} 

53 1-^54 / 1^55 / 1^56 
£33 



y(0) y{0) 

A^, A^ 



51 

V £31 



"-52 
£32 



X' 



Q' 



{<'} 14'} {4'} 
{^^4'} 14'} R^'/ 

where (using the notation {Z} to denote the symmetric part of X): 



{<}-,« +4'^) 



^V)T^ 



(11.31) 



(11.32) 



X22 - * * [X^'' 



22 



-X 



{/■)r 

22 ^ 



Ar'AXAr'£22-4'^) 



(Ari£22 - X^^f )%Ar' + Ar' A,(Ari£22 - X,^^') 



if)\T , 



(Ar'£22-X^y)'A,Ar'], 



(11.33) 



^>^{^P„)&Ri-\ ^^,, = - 



2 1 



l<k,l<j; 



X23 = G * 



and 



'X^ +X^f + Aj'Aj{Aj'E23 ~X^r)+Ar'A,{Aj'E2, - X^^) 



G = (gu) e /?'■«, gu = -si,, l<k,<i, l<l<t; 



xf = Ar' (£22 - A,X22), 1^3' = Ar' (£23 - A,X23). 



Proof Suppose X ^ Se — Se„, by using (11.25) and (11.30), we have 



(11.34) 



IIX - X 



■IWf 



\\X* - Q^XfQWl 



Ife -X3^3'||^) + (IIX44 -Xf^ll^) + (IIX55 -X|'| 
- (11^66 -Xg^' 11^) + (||Xi4 -Xf^lli + llxr. -Xf I 



(11^24 -X 



IIX 



34 



X 



IIX 



46 



V)||2 
24 IIf 

4^3^11^)- 

(f)||2 
46 llf 



IIX, 



.tf-)||2 



14 



41 
^42 \\f) + (11^34 -X34 



^24 
(IIX45 -X^g'll^ + 11X45 ^-^54 



X 



IIX 



46 



"-64 \\F 

K -4'llf) + (11^22 -X^^) 11^ 



X,^'||^) + (||X56-Xf' 



I) 
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(Ar'(£22-A,Z22))^-4l|^ 




Ar'(£22 - A;^22) - 4 IIf) + (11^23 ^ 


yOT||2 
^^23 \\f 


X^,,-X^j'X+mi\E23-AjX23)f 


-x^Wl 


Ar\E23~AjX23)~X^^\\l) + ao, 





(11.35) 

where kq is a constant. 

According to (11.35), ||X — Xf\\p — min if and only if each of the brackets in 
(1 1.35) takes minimum. Notice that X^ — Xji,,k — 3,4, 5, 6 and by making use of 
Lemmas 2 and 3, the results of this theorem can be derived easily. D 

Case II In the case of g > h, we first partition the symmetric matrix X* = Q^XQ 
into 8x8 block matrix, X* = (^w)gx8' ^^^^ the row dimensions (and the related 
column dimensions) of blocks are ij, t\ = h ~ i ~ j,t2^ g — h^n — g — j — t\ — 
hj, h , f2 respectively, and X/d = Xfi^,k,l ~ 1 , 2, . . ., 8 and let E = X^CqXb, which 
is partitioned into 5x4 block matrix, E = (£yt/)5x4, with the row dimensions of 
blocks are ij, ti,t2,m — g and the column dimensions of blocks are ij, ti,p — h 
respectively, then by the similar discussion processes, we can obtain the similar 
results of Theorems 3 and 4, therefore we omit the processes. 

According to Theorem 2.4, we can obtain an algorithm for finding the solution 
Xe of Problem I when g — h. 

Algorithm 1 

1. Input A,B, C andZ/. 

2. Make QSVD of matrix pair (A, 5^) by (11.7) and partition the matrix U'^CV = 
{Q,}3x3 by (11.19). 

3. Compute the matrix Q by (11.20). 

4. Make CCD of the matrix pair (A^,B) by (11.12) and partition the matrix 
^* = (^h)6x6 and E = {Ek,)^^^ by (11.23) and (11.24) respectively. 

5. Compute the matrix blocks j^}, X22 and Z23 by (11.32), (11.33) and (11.34) 
respectively. 

6. Compute the solution Xe by (11.31). 



11.3 The Solution of Problem II 

The process to solve the Problem II is similar to the process to solve the Problem I. 
Therefore we only present the main steps and main results when g = rank(A) — 
rank(B) = h, while the proofs are omitted. 

The first step is to transform Problem 11 into the following equivalent problem. 
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Problem IIo Given matrices A e R"""',B e R'"'\ Ca e R""'' and Xj e R"""", let 
Sa„ = {X\X e AR'""", AXB ^ Ca} (11.36) 

Then find Xg G Sa^ , such that 

\\Xa ^ XfW, ^ mm \\X - XfW,. (11.37) 

Theorem 5 Given A e /?»'^",B e 7?"^^, C G /?""''', /ef X^ Z^e any solution of (11.2) 
with Q the skew-symmetric cone, and define 

C„^AX,B, (11.38) 

then the matrix equation 

AXB = Ca, (11.39) 

is consistent in AR"^", and the skew-symmetric solution set S^^ of the matrix 
equation (11.39) is the same as the skew-symmetric solution set Sa of the the least- 
squares problem (1 1.2). 

The second step is to find the expression of Ca by using QSVD theorem. 

Theorem 6 Let A,B, C be given in Problem II, the matrix pair {A,B^) has the 
QSVD (11.7), and U^CV have partition (11.19), then for any skew-symmetric 
solutions Xs of the least-squares problem (11.2), the matrix Ca determined by 
(11.38) can be expressed by the following form: 




(11.40) 



where 

Xq = 4>*{sc22-c1^s), 



1 , (11-41) 

(%,)eS7?''^% cp,i=^-^_, l<k,l<s'. 



'k ^ "/ 



Now suppose X £ Sa^, then partition the skew-symmetric matrix X^ = Q XQ 
into block matrix, 

X, = {Xu),,,, (11.42) 

with the row dimensions (and the related column dimensions) of blocks 
are i,j,t,n — g —j — t,j, t respectively, and Xi^i — —Xfj^ ,k,l— 1,2,. ..,6. Let 
F = X^CaXs and also partition F into block matrix. 
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F = {Fu), 



(11.43) 



with the row dimensions of blocks are ij, t,m — g and the column dimensions of 
blocks are ij, t,p — g respectively. 

The next step is to present the general form of the elements of S^^ by using 
CCD theorem. 

Theorem 7 In Problem IIq, the general form ofXCz Sa„ can be expressed as 



( F 



X=Q 



^11 


Fn 


fl3 


Xi4 


■^^51 


~Fl, \ 


F'n 


X20 


^23 


X24 


y(0)r 


-FI2 


F\, 


-ZJ3 


X33 


X34 


y(0)r 

■^^53 


~FL 


XI, 


~ 24 


A34 


X44 


X45 


X46 


51 


v(0) 

''52 


y(0) 
''53 


-XL 


X55 


-'''56 


^31 


F32 


F33 


-Xle 


-Xle 


^66 / 



Q' 



(11.44) 



where Xjik ^ —Xli^,2<k<6, Xi4,X23,X24,X34,X45,X46 and Xs6 are arbitrary 



matrices with the associated sizes, and F^j = A; (F21 + AjFfj): ^52 '^z 

{F22 - A,-X22), ^^3' = ^J\F2i - AjX2i. 

Finally we give a lemma about the skew-symmetric solution of a minimum 
problem, and present the solution of Problem II. 

Lemma 4 For given matrices J\,J2 and J^ €R''^'%Sa = diag(ai, . . .,flj) > 0, 
Sb = diag(£>i , . . .,bs) > 0,Sc = diag(ci , . . ., c^) > 0, then there exists a unique 
skew- symmetric matrix W £ A/?'^', such that 



H 



\\SaW - JiWl + \\Sl,W - J2\\l 



]ScW -JiWp = min, 



and W can be expressed as 
where 



(11.45) 



(D = ((/.„) G R'^\ </.,, = l/ial +a]+bl + b] + cl+c}),\<k,l<s 



Theorem 8 Let matrices A, B, Cand Xfbe given in Problem II, suppose 
rank(A) = rank (5), and partition the matrix Q^XfQinto block matrix (11.30). 
Then there is a unique solution X„in Problem II, and X„can be expressed as 



Xa = Q 



( Fu 


Fn 


Fl3 


y(/') 


y(0)7- 
"''^51 


-F'n 


-FI2 


Y22 


Y23 


y(/) 
''^24 


y(0)7- 
"''^52 


~Fl2 


~FI, 


-'^23 


yV) 
133 


y(/0 


y(0)r 


-'^33 


y(/--) 
-•^41 


y(f) 


y(/-) 


y(/0 
J^44 


yV) 


yV) 


y(0) 


y(0) 


y(0) 


y(/0 
''54 


yV) 
-'55 


yV) 


V ^31 


^32 


f33 


y(/0 


yV) 


yV) 
''^66 



Q' 



(11.46) 
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where 

^kl ~^2^'^kl ~^lk )' 

~ {Ar'F22+x'f^^f\Aj'+Aj'MAj'F22~X^^) 



(A7'F22-42')%Ar 






Y23^G* 



and 



4' - 4'^ + Ar' AXAri/='23 + x'f ) + Ar> A^A/'f^b - 4'] 



y(^' ^ Ar' (F,2 - A,y22), rf = a;' (F23 - A,-y23). 



11.4 Conclusions 

Using the projection theorem in a Hilbert space, the quotient singular value 
decomposition (QSVD) and the canonical correlation decomposition (CCD), we 
obtained explicit analytical expressions of the optimal approximation solutions for 
the symmetric and skew-symmetric Procrustes problems related to the linear 
matrix equation AXB = C. According to these new results, we can design new 
algorithms to solve optimal approximation problems for the constrained linear 
matrix equation and related least-squares problems, which can be applied to some 
scientific fields, such as inverse problems of vibration theory, control theory, finite 
elements, and multidimensional approximation and so on. In some aspects, we 
have generalized the works of Higham [13], Sun [23], Chu [4] and Hua [5]. 
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Chapter 12 

Some Inverse Eigenvalue and Pole 
Placement Problems for Linear 
and Quadratic Pencils 

Sylvan Elhay 



Abstract Differential equation models for vibrating systems are associated with 
matrix eigenvalue problems. Frequently the undamped models lead to problems of 
the generalized eigenvalue type and damped models lead to problems of the 
quadratic eigenvalue type. The matrices in these systems are typically real and 
symmetric and are quite highly structured. The design and stabilisation of systems 
modelled by these equations (eg., undamped and damped vibrating systems) 
requires the determination of solutions to the inverse eigenvalue problems which 
are themselves real, symmetric and possibly have some other structural properties. 
In this talk we consider some pole assignment problems and inverse spectral 
problems for generalized and quadratic symmetric pencils, discuss some advances 
and point to some work that remains to be done. 

Dedicated with friendship and respect to Biswa N. Dattafor his 
contributions to mathematics. 

S. Elhay 2008 



12.1 Introduction: Linear and Quadratic 
Eigenvalue Problems 

The equation 

Mv"(f) + Cv'(f)+Zv(f) = 0, (12.1) 

M, C,K E M"^" and v{t) e M" is used in many engineering applications to model 
natural phenomena. In the context of the free vibrations of a linear, time-invariant 
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vibratory system M, C and K are the mass, damping and stiffness matrices, 
respectively. 

Separation of variables, 

v(f) = xe'-' 
X e R" a constant, leads to an eigenvalue problem for the quadratic pencil 

Q{A)=?-M+aC + K. (12.2) 

When M is symmetric positive definite (spd) and C, K are symmetric we say 
that Q is symmetric definite and throughout this paper we will assume, unless 
stated otherwise, that (12.2) is real and symmetric definite. 

Consider first the linear pencil 

P{X) ^K~aM 

which is a special case of Q(/^() in which C = O and we use the substitution 
fi^ = —A. The scalar iy and the associated n-vector Xj ^ are called an eigenpair 
of P if they satisfy 

P{aj)xj^O. (12.3) 

The pencil P has n real eigenvalues ii, I2, • • •, a„ and « linearly independent, 
real eigenvectors Xi,X2, . . .,JC„ [29] because M is spd. We denote the spectrum, 
{Xj}"^i, of P variously as 

o-(P(l)) or ff{-M,K). 

We can assemble all the relations of the form (12.3) into a single matrix 
equation 

KX-MX = 

if we define A = diag{/ii, /I2, ■ • •, Ai} and X ~ [xi,X2, ■ ■ .,x„]. It is well known 
[29] that the eigenvectors in X satisfy the orthogonality relations 



xjKxj = 1 . , . 



and their scaling can be chosen so that 

X^KX^A and X^MX = I. (12.4) 

Thus, X is the matrix which simultaneously diagonalizes the two matrices 
M and K. The eigenvalues of P are given by the Rayleigh quotients. 
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Xj Kxj ^ 

^i ~ ~m — ~ ■*•/■ ^■*-/ 

■' XjMxj ■' 



if the scaling (12.4) is used. The orthogonality of the eigenvectors has extensive 
practical application in science and engineering (see [7, pp. 512-531, 23]). 

The quadratic pencil (12.2) has 2« finite eigenvalues which are the zeros of the 
degree 2n polynomial 

dete(i) == detMl^" + a2«-ii^""' + • • • , 
and we will assume, unless stated otherwise, that the spectrum of Q 

cj{Q{l))^a{M,C,K)'^{Aj}jl, 

consists of 2n distinct eigenvalues. The pencil Q then has n linearly independent 
eigenvectors and we can write the « x 2« system 

MXA^ + CXA + KX = (12.5) 

in which X e C^", and A = diag {li, I2, • • •, h„} & C^"^^". Since M, C,K are 
real and M is invertible, the eigenvectors and eigenvalues of Q are pairwise self- 
conjugate in the sense that they are self-conjugate and x,- — Xj whenever i, = I,, 
for all i and j. 

Relation (12.5) is sometimes linearised into block companion form as 

M^K ~M-^c) \Xa) ^ \Xa)^' 
or its symmetric, generalized eigenvalue problem equivalent 

'O K\f X \ (K \f X 



K C)\XA) \ --M)\XA'^' 

There is [9] a set of three orthogonality relations' for the quadratic pencil which 
generalize the orthogonality relations (12.4) for the linear pencil: 

AX^MXA - X^KX = Di, 
AX^CXA + AX^KX + X^KXA=D2, } (12.6) 

AX^MX + X^MXA + X^CX = D^, 

where the three diagonal matrices, £> 1,2.3, satisfy 

Di=D3A, D2 = ~DiA and D2 = -DsA^ 



In this paper a conjugate transpose is denoted by a superscript H while transposition, which is 
denoted by a superscript T, does not mean conjugate transpose even for complex quantities. 
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These relations, written componentwise, are 



xJ{liXjM ~K)xj = 



xJ{liAjC+{Ai + Aj)K)xj = 
xJ{{Ai + lj)M + C)xj = 



i^j- 



Provided the denominators do not vanish, we can also write 



-{k + A;) 




i^j 



and the Rayleigh-quotient-like expressions 



1,. 
-1/ 

l2 



x](2'AiM+C)x, 
x]{A]C+2!.,K)xi 

x]()'-M-K)Xi 
xJ(/JfC+2AiK)xi 

x^(2AiM+C)xi 



'■=1,2, 



, 2n. 



Note that when C = O, this last relation simplifies to the Rayleigh quotient 
xjKxj — Aj (recall the substitution —1 — fi). Unlike the linear case, however, 
there is in general no way to simultaneously diagonalize three symmetric matrices. 

Table 12.1 shows information about the eigendata of quadratic pencils and is 
drawn from the excellent survey of the quadratic eigenvalue problem by Tissuer 
and Meerbergen [35]. As is pointed out in their survey, quadratic eigenvalue 
problems arise in the dynamic analysis of structural mechanical, and acoustic 
systems, in electrical circuit simulation, in fluid mechanics, signal processing and 
in modeling microelectronic mechanical systems. 

In this paper we describe some methods for the solution of certain inverse 
eigenvalue and pole placement problems for the linear and quadratic pencils. Our 
interest here is in the linear algebra although we will sometimes also make ref- 
erence to the application of the problems we discuss in the study of vibrations. 

The rest of the paper is organised as follows. In Sect. 12.2 we discuss an inverse 
eigenvalue problem for a matrix pair that arises in the modelling of the axial 
vibrations of a rod. The problem can be solved [33] by fixed point iteration and has 
an interesting reformulation as either an inverse eigenvalue problem for a (sym- 
metric, tridiagonal, unreduced) Jacobi matrix or an inverse singular value problem 
for a bidiagonal, unit lower triangular matrix. 

In Sect. 12.3 we briefly mention the solution [31] to an inverse eigenvalue 
problem for the symmetric definite quadratic pencil which has application to the 
study of damped oscillatory systems. 
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Table 12.1 Some properties of the eigenvalues and eigenvectors of quadratic pencils 

y{M,C,K) = min||^||,=i(jc"Cx)^ -4(jc"Mr)(A:"*:x) 



Matrix 



Eigenvalues 



Eigenvectors 



M nonsingular 
M singular 

M, C, K real 

M, C, K Hermitian 

M Hermitian positive 

definite, C, K Hermitian 
positive semidefinite 

M Hermitian positive 

definite, C Hermitian, K 
Hermitian negative 
definite 

M, C symmetric positive 
definite, K symmetric 
positive semidefinite, 
y(M, C,K)>0 
(overdamped) 

M, K Hermitian, M positive 
definite, C = -C" 

M, K real symmetric and 
positive definite, 
C=-C^ 



2n finite eigenvalues 
Finite and infinite 

eigenvalues 
Finite eigenvalues are real or 

conjugate pairs 
Finite eigenvalues are real or 

conjugate pairs 
Re{A) < 



If X is a right eigenvector of X then 
X is a right eigenvector of a 

If X is a right eigenvector of 1 then 
X is a left eigenvector of I 



Real eigenvalues 



Real eigenvectors 



I's are real and negative, gap 
between n largest and n 
smallest eigenvalues 



Eigenvalues are pure 
imaginary or come in 
pairs {X, —a) 

Eigenvalues are pure 
imaginary 



n linearly independent 

eigenvectors associated with 
the n largest {n smallest) 
eigenvalues 

If jc is a right eigenvector of a then 
X is a left eigenvector of —X 



In Sect. 12.4 we consider problems of partial pole assignment by single-input 
control for the symmetric definite quadratic pencil. We describe an explicit and a 
computational solution [9] which both derive from the orthogonality relations 
(12.6). 

In Sect. 12.5 we consider three methods [8, 32] for partial pole assignment by 
multi-input control for the symmetric definite quadratic pencil. The first parallels the 
computational solution for the single-input case. The second, not described in detail, 
again uses the orthogonality relations ( 1 2.6) to construct a multi-step solution and the 
last is based on the Cholesky factoring and the Singular value decomposition. 

In Sect. 12.6 we describe a technique [10] for solving the problem of partial 
eigenstructure assignment by multi-input control for the symmetric definite qua- 
dratic pencil. This method is again based on the exploitation of the orthogonality 
relations (12.6). 

In Sect. 12.7 we describe two methods [17, 18] of pole assignment for the 
symmetric definite quadratic pencil by affine sums and in Sect. 12.8 we describe an 
explicit formula for symmetry preserving partial pole assignment to a symmetric 
definite matrix pair. 

Finally, we summarise in Sect. 12.9 and indicate some further worthwhile work. 
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All the methods described in this paper are the result of work done variously 
with Professors B.N. Datta, G.H. Golub and Y.M. Ram. The author tenders 
grateful tribute to their contributions. 



12.2 An Inverse Eigenvalue Problem for a Linear Pencil 
Arising in the Vibration of Rods 

To begin with we consider an inverse eigenvalue problem for a rather special 
matrix pair. The problem arises in a discretization of the eigenvalue problem 
{EAy')' + lpAy = Q,0<x<l, with boundary conditions y(0) = 0, >'(!) = 0, 
associated with the axial oscillations of a non-uniform rod. Here E, p and A are 
problem parameters. 

Denote the Kroenecker delta by (5y and denote by /, S and E, respectively, the 
n X n identity, shift, and exchange matrices 

Further, denote by F = / — S the finite difference matrix. 

Problem 2.1 Given a set 5 = {/'-,}'Li of real numbers such that HLi '^y = 1 ^^'^ ^ 
diagonal matrix D ~ diag {di,d2, . . .,d„}, dj > such that the pencil P(A) — 
F^DF - W has spectrum S. 

Note that if/) is a solution to Problem 2.1 then oJ), a. scalar, is also a solution so 
we can, without loss of generality, set d\ ^ \. Note also that <j{P) — g{D^'^F'^DF) 
and, in view of the fact that dst{F) = 1 we know that the eigenvalues of this 
system must therefore satisfy Ht^i ^"t ^ 1- Thus, 

a. n — 1 eigenvalues uniquely determine the «th and 

b. there exist a finite number, at most (« — 1)!, different (generally complex) 
solutions D even though only real positive solutions have a physical meaning in 
the context of vibrations. 

Define B = D^SD^S^ - S^DS and b = d„ED^E. Interestingly, the three 

systems 

F^DF-W, F^DF + B}D, and F^DF - W, 

are isospectral (see [33] for details). 

We can recast Problem 2.1 into an interesting equivalent inverse standard 
eigenvalue problem form. Denote by H the sign matrix [//] . — (— )'^ Sjj. Then the 
matrix HD^I^F^DFD^I^H has the same eigenvalues as P{k) = F^DF - )D and 
the (symmetric, tridiagonal, unreduced) Jacobi matrix/ — HD^'^F^DFD^'^H — I 
which has eigenvalues <a, = 1,-1 has the quite special structure 
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^1 fi P: 



Pi Pi P3 



where 



Pi^^djdi 



+u 



Pn-l Pi- J 



/ = 1, 2, . . .,« — 1. 



Our problem is now to reconstruct J from its spectrum only. There are just 
n — \ free parameters to be found from n — 1 independent pieces of data: co,, 
i — 1,2, . . .,« constrained by 11^=1 (^ + "^/t) = 1- The equivalent singular value 
assignment problem is 

Problem 2.2 Find, if it exists, a unit lower bidiagonal matrix 

/?, 1 
Pi 1 

V Pn-l 1/ 

which has prescribed singular values cri, (T2, . . ., (T„ constrained by 0"=! (^k = i- 

L here is the Cholesky factor LL^ = J + 1. 

It turns out [33] that the {i3.}/'Ji satisfy a fixed point equation AQA^b = c 
where Q = diagjoj] , 02, • ■ ■, (o}„ 





, b = 


( 1 \ 

Pi 

[Plj 


, c = 


/ \ 

Kit- J 



and each a, defines a diagonal scaling matrix that turns the first row of the 
eigenvector matrix of 7 into the ith row. A linear convergence fixed point iteration 
scheme for the solution of this problem is given in Ram and Elhay [33]. 



12.3 An Inverse Eigenvalue Problem for the Tridiagonal, 
Symmetric Definite Quadratic Pencil 



We now consider the system of monic, time differential equations 

Iv" + Cv' +Kv = {S, 
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in which the matrices C.K £ 



(a.1 p, 
Pi «2 P2 
P2 a3 



are tridiagonal, symmetric. 



V 



0\ 







/7. 



K 



6 1 72 

(^2 





73 



V 








/'«/ 



/ is an identity, and v = v{t). The corresponding quadratic pencil is 

Q{a) = )?I + IC + K. (12.7) 

Our problem here is to find C and K from the spectrum of the full pencil and the 
spectrum of the reduced pencil in which the last row and column have been 
deleted. More precisely, we need to solve 

Problem 3.1 Given two sets of distinct numbers {Ik] kL\^^'^{pi-k} h^i ' ^vad tri- 
diagonal symmetric C and K such that Q{1) = X I + XC + K satisfies 

det[Q[k)) has zeros {)ik\i^^i 

and the matrix Q{X), obtained by deleting the last row and column of Q{1) 
satisfies, 

det{Q{X)) has zeros {iik\i^^ ■ 

The problem of reconstructing one unreduced, symmetric, tridiagonal matrix 
from its n eigenvalues and those of its leading principal submatrix of dimension 
n — 1 is related to Problem 3.1 (it is a special case of (12.7) with K a Jacobi matrix 
and C = O and has received much attention in the literature (see eg., [3, 13, 21, 
22]). In vibrations, this may be regarded as identifying the spring configurations of 
an undamped system from its spectrum and the spectrum of the constrained 
system where the last mass is restricted to have no motion. Problem 3.1 corre- 
sponds to determining the spring and damper configurations of a non-conservative 
vibratory system which has a prescribed spectrum and which is such that the 
associated constrained system has a prescribed spectrum. 

We have shown by construction [31] that this problem is always soluble over 
the complex field, it has at most 2"(2n — 3)!/(n — 2)! solutions and all solutions 
can be found by a method that, aside from finding the roots of certain polynomials, 
requires only a finite number of steps. 

The solution matrices should (a) have positive diagonal elements, (b) have 
negative off-diagonal elements and (c) be weakly diagonally dominant, for prac- 
tical realizations in vibrations. Finding solutions that satisfy these constraints for 
large, real problems remains a challenge. 
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12.4 Partial Pole Assignment by Single-Input Control 
for the Symmetric Definite Quadratic Pencil 

Partial pole assignment by state feedback control for a system modeled by a set of 
second order differential equations is used in the stabilization and control of 
flexible, large, space structures where only a small part of the spectrum is to be 
reassigned and the rest of the spectrum is required to remain unchanged. 

In this problem the homogeneous differential equation (12.1) is replaced by an 
equation with a forcing function bu{t), Z> e M" a constant, and u{t) a scalar, by 
which we want to control this system. Thus the differential equation is now 

Mv" + Cv' +Kv^ bu{t) (12.8) 

and we seek constant /,g e M" which define the control 

u{t)=fv'{t)+g^v{t) (12.9) 

which will assign all, or part, of the spectrum of the system. Substituting for u{t) in 
(12.8) with (12.9) leads to the closed loop system 

Mv" + (C - bfy + {K^ bg^)v = 0, 

the dynamics of which are characterised by the eigenvalues of the closed loop 
pencil 

e,(i) = Ma^ + (C - bf)A +{K~ bg^). 
It is well known [27] that the system is completely controllable if and only if 

rankji^M + AC + K,b}^ n, 

for every eigenvalue of Q. Complete controllability is a necessary and sufficient 
condition for the existence of / and g such that the closed-loop pencil has a 
spectrum that can be assigned arbitrarily. However, if the system is only partially 
controllable, i.e., if 

rankji^M + AC + K,b} ^ n, 

only for m of the eigenvalues A = A,;, k ^ 1,2, .. .,m,m<n, of the pencil, then 
only those eigenvalues can be arbitrarily assigned by an appropriate choice of/ 
and g. The system (12.8) is partially controllable iff the vector b is not orthogonal 
to {jc,,.}™^), the eigenvectors corresponding to the assignable eigenvalues {A,j}j!^j. 
This problem can approached by using the block companion first order reali- 
zation i.e. by finding/ such that the 2« x 2m matrix A — by , where 



O I \ ^ O \ , -g 

M^K ~M-^CJ' \M-^b)' ■' I-/ 
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has the desired spectrum. The first order realization solution, however, is not 
always suitable since it does not respect structural properties such as sparsity, 
handedness or positive definiteness that are sometimes assets for this problem. 

It is often also an important practical requirement that no spill-over, the 
phenomenon in which eigenvalues not intended to be changed are modified by the 
process, occurs [1, 2]. In fact, partial pole assignment without spillover is possible 
[9] knowing only the eigenvalues li , I2, • ■ ■, ''-m which are to be assigned and their 
associated eigenvectors. Where n is large and only m <^ n eigenvalues are to be 
assigned this can be a significant advantage. The eigendata can be found by 
computation using a Krylov subspace methods [30], or by modal analysis mea- 
surements when the physical structure is available [23] More precisely, the 
problem we now have is: 

Problem 4.1 Let (12.5), 1, distinct, be an eigendecomposition of the quadratic 
open loop pencil (12.2). 

Given a self-conjugate set of m<2n complex numbers /ii,/ii, . . .,/i„ and a 
vector beW, 

Find/,g^ e C" which are such that the closed loop pencil 

Q,{1) = M)? + (C - bf)X +{K~ bg^), (12.10) 

has spectrum {fii,fi2, ■ ■ •, Mm, ^.+i, ■ • •, hn}- 

Let us partition the n x 2n eigenvector matrix and 2n x 2n eigenvalue matrix as 

X={Xi X2), A=(^' ) "' (12.11) 

171 2n—m \ 2 / 2n—m 

m 2n—m. 

The theorem that follows gives conditions on the vectors/, g which ensure that 
the eigenvalues not being assigned remain unchanged after the assignment, thus 
avoiding spillover. 

Theorem 4.1 Let 

/ = MZiAi/J, g=~KXyP, per. (12.12) 

Then, for any choice of fi we have 

MX2AI + (C - bf^)X2X2 + {K- bg^)X2 = O. 

If there exists such a vector ^, then there exist an eigenvector matrix F € C "'^™, 
J'= (Ji:3'2,---,3'm)> >'j7^0) ;=1,2, ...,m, 
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and a diagonal matrix, D = diagj/ij , fi2, ■ ■ ■, /i„,} of the eigenvalues to be assigned, 
which are such that 



So 



MYD^ + (C - bf)YD + {K- bg^)Y = O. 
MYD- + CYD +KY ^ bp^{AiX\MYD - X\KY) 

^bfz\ 

= bc^, 

where Zi = DY^ MX lAi — Y^KXi and c = Zi^ is a vector that will depend on the 
scaling chosen for the eigenvectors in Y. To obtain Y, we can solve in turn for each 
of the eigenvectors j,- using the equations 

ifiJM + fijC + K)yj = b, j=\,2,...,m. 

This corresponds to choosing the vector c = ( 1 , 1 , . . . , 1 ) , so, having computed 
the eigenvectors we could solve the m-square system Zi^=(l,l,...,l) for p, 
and hence determine the vectors /,g using Theorem 4.1. However, there exists an 
explicit solution for this problem: 

Theorem 4.2 Suppose the open loop quadratic pencil (12.2) has eigendecom- 
position (12.5) and that f and g are as in (12.12) with the components fij of ji 
chosen as 

/;,=r^^nFT' J-h2,...,m. (12.13) 

b Xj ij if A,- - Ij 

¥j 

Then, the closed loop pencil (12.10) has spectrum {/i[,/.(2, . . .,/i„,i„,+i, . . .,^1212} 
and its first m eigenvectors can be scaled to satisfy {l^fM + fijC + K)y: = b. 

Formula (12.13) reveals three conditions (for the existence of P) that apply to 
the m eigenvalues which will be replaced, and their associated eigenvectors: 

a. no ?.j, j = 1,2, . . .,m may vanish, 

b. the {Xj}'"^^ must be distinct, and 

c. b must be not orthogonal to x,, 7 = 1, 2, . . ., m. 

We note that if all the {a.j}"'^^ are real, then Xi is real as well. If, in addition, all 
the {a',},1i are real, then p and sof,g are also real. 

Furthermore, if the set of eigenvalues which are to be replaced is self-conjugate 
then/ and g are real and they specify a solution which can be physically realized. 
Indeed, the whole calculation can be done in real arithmetic [9]. 

Finally, we note that any problem in which condition (a) is violated can be 
handled with a shift of origin: 
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Lemma 4.3 Let 

U invertible, have spectrum {!,■}■"[. Then, the pencil 

Q{1) = /?U + IV+W 

with 

V = V + 2pU, W=W + pV + p^U, 

scalar p, has spectrum {1/ — p}:!!^. If, in addition, Q{1) is symmetric definite, then 
2(1) is also symmetric definite. 

12.5 Partial Pole Assignment by Multi-Input Control 
for the Symmetric Definite Quadratic Pencil 

Although the pole assignment problem by single input control and its solution are 
satisfactory from a theoretical standpoint, one may still encounter some difficulties 
when using this solution in a practical vibrations context. It may happen that the 
control force required to relocate the poles is so large that it cannot be imple- 
mented in practice without causing an early structural fatigue. This practical dif- 
ficulty may be overcome by using a multi-input control, one in which the vector ^ 
of (12.8) replaced by a matrix B e M"^'" and the scalar u{t) is replaced be a vector 
u{t) € M'". Our differential equation is now 

Mv" + CV + Kv = Bu{t) 

and we seek F,G € R"'<™ to assign part or all of the spectrum of (12.2). 

In this section we describe methods that are applicable to two slightly different 
problems. The first deals with the case where we make no extra assumptions about 
the given matrices C, K and the second concerns that case where C and K have the 
addtional property that they are non-negative definite. Recall that this extra 
property ensures that all the eigenvalues of the pencil Q have non-positive real 
part, something which is important in the stability of dynamic systems. The second 
method achieves the assignment while ensuring that all the eigenvalues not 
assigned, even though they may have been moved, still have non-positive real part. 

For alternative treatments of this problem which also address the issue of 
robustness see [4, 5]. 



12.5.1 Method 1 

Problem 5.1 Given 2p < 2n self-conjugate numbers fii,fi2,..., fi2p, andB e 
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Find F,G e R"""" such that the closed- loop system 

Q,il) ^ Ml^ + {C ~ BF'^)1 + {K ~ BG^), (12.14) 

has spectrum 

As before, the problem has a computational solution [32] that avoids spillover but 
an explicit solution for this problem has yet to be found. 

Let the eigenvalue and eigenvector matrices of Q be partitioned as in (12.11). 
The following theorem is the matrix counterpart of Theorem 4. 1 and gives con- 
ditions on B which ensure that there is no spillover. The multi-input control form 
(12.15) directly generalises the single-input form (12.12). 

Theorem 5.1 Let 

F^MXiAiB, G^-KXiB, BeC^'""". (12.15) 

Then, for any choice of B we have 

MX2AI +{C- BF^)Z2A2 + {K- BG^)X2 = O. 

Any choice of B with F, G chosen thus guarantees that the last 2n — 2p 
eigenpairs (A2,A'2) are also eigenpairs of the closed loop pencil. 

In a development analogous to the vector case we can find [32] the matrix B and 
from it F, G which solve the problem in one step. An alternative, multi-step solution, 
is also available which uses the explicit solution (12.13) to the single input case. One 
application of the method based on Theorem 4. 1 leaves the pencil unsymmetric so 
we cannot use that technique repeatedly. However, it is possible to use the orthog- 
onality relations (12.6) to compute F and G which will do the job [32]. 

12.5.2 Method 2 

We now consider multi-input control systems where the matrices C,K are non- 
negative definite, thus ensuring that all eigenvalues of the pencil have non-positive 
real part. The technique of this section [8] assigns part of the spectrum and avoids 
spillover, not by preserving the eigenvalues which are not assigned, but by 
ensuring only that the unassigned eigenvalues, even if they have been moved by 
the assignment, have non-positive real part. 

Problem 5.2 Given the pencil (12.2) with M spd, C,K non-negative definite, 
B e R'^^^and a self-conjugate set{^^}^f^j of 2p < 2n scalars. Find matrices F, 
G G R"^"' such that the spectrum of the closed-loop pencil (12.14) contains the set 
{/ij.}„^^jand the complementary part of the spectrum has non-positive real part. 
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We begin by constructing diagonal p x n matrices 

D,^{D, O), Dp^{Dp O,) 

p n-p p n-p 

which give k I + W„ + D/j the required eigenvalues. We then compute the 
Cholesky and SVD factorings 

LL^ = M, USV^ =L^B, 

U and V orthogonal and S diagonal. Since the spectrum of Q is invariant to 
multiplication by an invertible matrix, we can write 

ff(M, C - BF^, K - BG^) = ff(/, C - SF^, K - SG^), 
where we have defined 

C=U^L-^CL-^U, k=U^L^KL^U 
F^ = V^F^L-^U, Cf^^V^G'L-^U. 

To find F, G we note that, with the blocking 

c,\ p i,_(k,\ p .fsA P 



1^ 

C2 ) n-p \J^1 J "-P \ J "^P 



P 



the choice F = {C^ - DI)S];^ makes the first block row of C - SF'^ = {D^ O) 
and leaves the other n ~ p rows unchanged. Similarly, the choice G — {K\ — 
/J^S^' makes the first block row of K — SG^ — {Dp O) and leaves the other 
n— p rows unchanged. Thus 

yHi Hi J "-P 



and a{Q^{X)) = a{Q^) U (j{Q^). In addition Qy{X) = ;.% + ?I)^+bp is diagonal 
and has the assigned eigenvalues. It's easy to show [8] that Q^ is a positive semi- 
definite pencil and all its eigenvalues have real part that is non-positive. 



12.6 Partial Eigenstructure Assignment by Multi-Input 
Control for the Symmetric Definite Quadratic Pencil 

We now consider the case where the problem is to find all three control matrices 
F, G and B to replace some eigenpairs of the quadratic pencil (12.2). Assume we 
have the eigendecomposition (12.5) and the partitioning (12.11). 
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Problem 6.1 Given 

a. Q as in (12.2) with eigen decomposition (12.5) 

b. Xi and A\ pairwise self-conjugate and partitioned as in (12.11) 

c. Fi G C"^"', Di € C"'^'", pairwise self-conjugate such that with 

m In-m \ ^2 J 2«-m 

m 2n—m 

f Y \ 
the matrix 1 is invertible. 

Find B,F,Ge W'" which satisfy 

MYD^ + (C - BF^)YD + {K ~ BG^)Y = O. (12.16) 



The process of finding the F, G and B which will assign these m eigenvalues 
and their associated eigenvectors is done in two stages: 

a. First, determine matrices B^F and G which are generally complex and which 

satisfy 

MYD^ + (C - BF^)YD + {K ~ B(f)Y = O. (12.17) 

b. Second, from B,F and G find real B,F, and G such that BF^ = BF^ and 
BG^ = B(f. 

The first stage proceeds as follows. Let W £ C'"^'', p>m have pseudoinverse 
W+ € C"""' such that WW+ =1 e R""""'. If B, F and G is a solution of Prob- 
lem 6.1 then 

B = BW,F^F{W+f G^G{W+f 

is another solution because BF^ — BF^ and BG^ = B& . Now, if B,F and G is a 
solution and satisfies (12.16) then 

MYiD] + (C - BF^)YiDi + [K ~ B(f)Yi = O 

and it is evident that we can take B and W as 

MYiD\ + CYiDi+KYi =B{F^YiDx +(fYy). (12.18) 



^/ \^ 



w 



provided that W is invertible. Then for some F and G this B = BW is an admi- 
ssable solution. Equation (12.17) blocked as 
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has first block row 

MYxD\ + CY^Di + KYi - BF^YiDi - BG^Yi = O 

and,takingthedefimtionofBfrom(12.18),thisgivesB(/-f^yi/)i -G^Fi) = O. 
The full rank of B implies that 

F^YiDi+G^Yi=I. (12.19) 

The following theorem is a variation of Theorem 5.1. 
Theorem 6.1 For any B e C"""", F = MXiAiB and G = -KXiB satisfy 
MX2AI + (C " BF^)Z2A2 + {K~ B(f)X2 = O. 
Putting these expressions for F,G into (12.19) gives 
B = (AiXfMFiDi ~X\KYxy^ 

from which F and G can easily be found using the expressions in Theorem 6. 1 . 
It is easy to show that although B, F, G are generally complex, the products 
BF^ and BG^ are always pure real. The second stage of the solution, that of 
finding real B, F, G from the generally complex B, F, G can be carried out using 
any one of several factorings of the form LR = H,Le M"""",/? e K"""-" of the 
composite n x 2n product matrix 

h = b[f'^\g^]. 

We take B to be the L matrix, the first n columns of R to be F^ and the last 
n columns to be G^. The two factorings which immediately come to mind 
are the truncated QR and compact SVD [15]. Details can be found in Datta 
et al. [10]. 



12.7 Pole Assignment for the Symmetric Definite Quadratic 
Pencil by Affine Sums 

In this section we consider the pole assignment problem by means of linear 
combinations of matrix sets which span the space of matrices in which our target 
matrix lies. Two methods, in which the solutions are found by zeroing two 
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different nonlinear functions, are described. We motivate the metiiods by first 
considering the standard eigenvalue problem which arises from a set of first order, 
time differential equations 



.t' = Ax, A=A^ e 



xe 



This system's behaviour is characterized by eigenpairs {fii^,Xk),Ak scalar, ^ 
Xk e M", which satisfy (A — fil)x — 0. The Affine Inverse Eigenvalue Problem 
requires us to determine the vector a = (ai, a.2, ■ ■ •, ci„) of coefficients, if they 
exist, which define the matrix 



k=\ 



akAk, 



from its spectrum. The elements A^ € M"^", k = 0,1,2, .. .,n in the linear com- 
bination comprise the affine family. As an example. The eigenvalues of the matrix 



/ '<:i + K2 


-K2 




-K2 


K2 + K3 


-K3 




-K3 


Ki + K4 



-K4 



V 



K„/ 



determine the natural frequencies of a certain n-degrees-of-freedom mass-spring 
system with spring constants k, > and unit masses. An affine family suitable for 
such a 4 X 4 problem might consist of the set Ao.1.2,3 defined by 



/I 0\ /I -1 0\ /O 








-1 


1 










7 








' 





Vo 0^ 




VO 


0^ 




\o 



0\ /O 



' 

0/ \0 



\ 



1 -1 

-1 1 / 

(12.20) 



We note that the matrices in the affine set are real and symmetric and tridi- 
agonal. Hence the spectrum of A is real. 

Friedland et al. [20] consider several Newton-based algorithms for solving the 
affine inverse eigenvalue problem. Starting with some initial estimate of the 






(0) 



the methods 



solution a*"' = 

(ai,a2,...,a„)^ of /(a) = (ii(a) - ^ii, A2(«) - /^2; • 
{fJ-k}'k=i is the ordered target (real) spectrum, and /i,(a'"'' 
the similarly ordered spectrum of 



find the zero a = 

.,X„{ci)~H,y where 
is the y'-th eigenvalue of 
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A correction vector i**'"' to a*'"' is found from yf'") ,»("■) = —f{m) where the 
Jacobian is 



I x^A\X\ x^A.2X\ • • • x^AfjXi \ 

X2A]X2 X2A2AJ2 ■ ■ ■ -^2 «'^2 



\'*'n l"*"" -*^/;-'^2-*^« ■ ■ ■ X„AnXn / 



(12.21) 



and Xii is the eigenvector associated with l^ (a'"'' ) . The next iterate is found from 
i3((™+i) = ^(™' + a*'"' and the process continues until convergence or divergence 
occurs. 

For the quadratic problem we are interested in 

Problem 7.1 Given 

a. M e M'""" spd 

b. {Ck}'l^(j, {KiiYl^Q, Ck,Kic e M"^", symmetric, linearly independent, 

c. 5 = {fi-k}ii'=i^ ^ self-conjugate set of scalars. 

Define C = Co + ELi «*C, and, K = Ko + ELi /^i^*- 
Find real scalarsjat, j3^}^^j, if they exist, such that the pencil (12.2) has 
spectrum S. 

Affine methods are attractive because, unlike some other methods, they pre- 
serve structural properties like symmetry, handedness and sparseness. We now 
describe two affine methods of pole assignment for the quadratic symmetric def- 
inite pencil. In each a Newton method is used to solve the system of nonlinear 
equations that give the affine coefficients which achieve the assignment. 



12.7.1 Affine Method 1 



Denote the vector of target eigenvalues by fi ~ (/ij , /i2, . . ., H2n) ^^'^ '^he vectors 
of the unknown coefficients by a = {oi.i,a2, . . .jX,,) and p = {fii, (^2, ■ ■ ■, P„) ■ 
denote also the vector of open loop eigenvalues by /".(a, P) = 
(/,i(a, P), l2(«, P), ■ ■ ■, ^2n{^, P)) ■ We will assume that 

• the target spectrum S defines a solution, 

• there exists an open neighbourhood of a, p where Q{a.) has eigenvalues and 
eigenvectors which are analytic, 

• the target eigenvalues and eigenvectors for the current a and p iterates have the 
same number of reals and complex pairs. 

Let us assume that the target real eigenvalues are ordered fii<fi2< ■ ■ ■ <fJ.r 
and that 2n — r = 2c complex eigenvalues are paired 
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A'r+l ==Pl +»7l> t^r+2 = Pi - 'ni, 
A'r+S = P2 + '■'72 > /^r+4 = Pi- if] 2, 



l^2„-l = ft- + "Ic 



t^2n 



"Ic 



Suppose also that the eigenvalues to be replaced are similarly ordered ii < ?.2 < 

■■■<K 

^r+i = 4>2 + i^2, K+A = (t>2-i^2^ 
hn-\ = </>c + '"Ac I hn = </>c - '"Ac 

as are their pure real eigenvectors z,, (' = 1, 2, . . ., r and their complex eigenvectors 

z,.+i = xi + ;>! , z,.+2 = xi - ;>! , 

Zr+3 =-»f2 + !>2, Zr+A^Xi-iy^-, 



Z2n-\= Xc + (Jf. 



■^2« ^^ -^c 



ij-,. 



A Newton method for a and ^ which parallels the method described earlier for 
the inverse standard eigenvalue problem zeros the function 



/(«,/?) 



A2{X,P)- fl2 

<p2i'*J)- P2 
M'^J)- Pc 

^M,P) -'?i 
i/'2(a>^)-'?2 



We now derive the Jacobian for this function. Every eigenpair 1,, z,- satisfies 



z; (AfM + 1,C + K)zi = 0. 



(12.22) 



Denote a derivative with respect to either oCj or /? by a dot. Differentiating 
(12.22) gives 
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2zJ{a^M + hC + K)zi + zf (2i,-i,-M + Uc + hC + ^)z,- = 
which simplifies immediately to 

z](Xi{ihM + c) + ;4-c + k)zi = 0. 

Isolating a we gives 

zf(l/C + ^)z,- 



1,: = 



zi{2m + C)zi 



Since 



dC 

8a; 



6C_8/(r 
8Pj 8a,- 



O, 



8^ 
8^ 



K: 



we get, provided the denominators do not vanish, 



81; 

8a/ 



^izJCjZi 



z\{2hM + C)zi 



and 



zjKjZi 



dl, 



dpj zJ{2AiM + C)zi 



The Newton method is now 



/(««,/?«) 









= -/(aW,/J' 



« 



For details see [18]. 

Some simple examples illustrate the technique. All calculations here were 
performed in Matlab with IEEE Standard Double Precision arithmetic ie. using 
e « 2 X 10"'^. All numbers are correctly rounded to the number of figures shown. 

Example 7.1 Let n = 5,M = I, and let the affine family be 



Co: 



/ 10 


-10 








0\ 




( ^° 


-10 








0\ 


-10 


18 


-8 










-10 


18 


-8 











-8 


12 


-4 





, Ko — 





-8 


12 


-4 











-4 


12 


-8 










-4 


12 


-8 


V 








-8 


IV 




V 








-8 


11/ 



together with the rank-2, symmetric, tridiagonal elements 

a. Ck=Kk = {ek-ek+\){ek-ek+{f , ^ = 1,2, . . .,n - 1, 

b. C„ = K„ = e„el. 

These elements are of the same type as those shown in (12.20). For 



-/J-(-l, 1, 



-1, 1, 



-1)' 



(12.23) 
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Table 12.2 Eigenvalues of 
Q with a, j} defined by 
(12.23) 



i(a, P) 



-26.07397 
-18.47433 
-8.91370 
-2.48789 
-1.11603 
-0.36370 
-1.69234 - 
-1.69234 4 
-0.09285 - 
-0.09285 4 



2.43496; 
2.43496; 
0.72845; 
0.72845; 



Table 12.3 A second 
solution which assigns the 
eigenvalues in Table 12.2 



-0.54550 


1.08070 


-0.87850 


1.11642 


1.29810 


-0.58530 


0.89416 


-1.08588 


-0.83550 


1.47810 


-0.99899 


1.05144 


1.23910 


-0.70220 


0.99300 


-1.16071 


-0.70140 


1.01440 


-1.00968 


1.01043 



the pencil Q has eigenvalues shown in Table 12.2. Using the starting values 
a = —p = (—1.2, 1.4,-1.6, 1.8,2) to assign the eigenvalues in Table 12.2, the 
Newton method finds the solution (12.23) with the characteristic quadratic con- 
vergence and after 9 iterations we get || .^la'^',^' 'j — A(a, ^) ||2 <10"''*. 

Table 12.3 shows a second starting value and solution (of the many possible). This 
solution is obtained to the same accuracy as the first solution in 8 iterations. 



12.7.2 Affine Method 2 



A difficulty with the method in Sect. 12.7 is that it requires an uncomfortable 
assumption: that the number of real and complex pairs of eigenvalues remain the 
same throughout the iteration process. However, these can change from one 
iteration to the next even though after some point close to convergence they will 
remain the same. 

The crucial element is the ordering of the eigenvalues. In the real, symmetric, 
inverse standard eigenvalue problem there is a natural ordering of the (always real) 
eigenvalues at every step. We are thus able to pair /.^(a™) to /i^. correctly in/™ at 
each iterative step. 

In principle the algorithm described at the start of Sect. 12.7 for the real, 
symmetric inverse standard eigenvalue problem above can be extended to solve 
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affine inverse eigenvalue problems associated with non- symmetric matrices. Tiie 
Jacobian matrix elements of (12.21) are replaced by 



60,- 



T 



where J, here is the ith left eigenvector of A. The algorithm now can frequently fail 
however since the eigenvalues of this (real) A are generally complex and there is 
no natural way to order the eigenvalues consistently throughout the iterations. A 
simple example illustrates the problem. Consider the affine sum 



Ml 



/I 





0\ 




/-I 1 


0\ 





-1 





+ s 


1 


1 





2 








-2 1 


yo 





-2^ 




\-4 


-5 ij 



a. For (5 = this affine sum has eigenvalues ± 1 , ±2. 

b. For (5 = 1 it has eigenvalues ±/, ±2/. 

The bifurcation point where the four real eigenvalues become two complex 
conjugate pairs is at (5 = 1/(1 + V^). Any labelling scheme based on an ordering 
which is used for the real eigenvalues when 5<l/{l + ^/3) has no natural 
extension once (5 > 1/(1 + v^)- 

A technique for overcoming this problem [17] of ordering is based on zeroing 
the function 



/•(a) = det Uo - /i,/„ + ^ a^A^ j , / = 1, 2, . . ., n 



rather than/;(«) = A,(a) — fij, i = 1, 2, . . ., n. Since no pairing of the eigenvalues is 
needed, the method is applicable to 

• non-symmetric affine inverse linear eigenvalue problems with complex 
eigenvalues, 

• non-symmetric inverse generalized eigenvalue problems, and 

• inverse quadratic eigenvalue problems and higher order matrix polynomial 
inverse eigenvalue problems. 

The regular pencil P{?.) ^ ?A + B (det(P(/i)) not identically 0) has as its 
eigenvalues 

a. the zeros of pif{l) = det{P{l)) and 

b. oo with multiplicity n — k if k<n. 

If P{1) is regular, there is a generalized Schur decomposition [14] which gives 
Q,Re C"'^", unitary, such that 
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QP{A)R = AT A + Tb, 

Ta,Tb G C"^" upper triangular. Quotients of the diagonal elements ai,a2, . . .jOn 
and bi,b2, ■ ■ -jb,, of Ta and Tg, define the eigenvalues of the pencil. Importantly, 
the matrices Q and R can be chosen so that 

• {ai,bj) are in any order on the diagonals of Ta and Tg and 

• det(e) = det(/?) = 1. 



Now, det(P(/l)) — Yi'j^ii'^j + ^j)- If fl; 7^ 7^ bi then we can differentiate to 



get 



8 

8/, 



det(M + B) = ^ a; ]^(/,fl/ - bj). 



Otherwise, if ai,a2, . . .,a,- are nonzero and fl,+i = a,.+2 = •• = «« = 0, then 
we have (none of the b, can vanish by the assumption of regularity) 



d_ 
8l 



det(M +B)=i l[bAj2 «<■ n(^«i - ^/)- 

V/=r+l / 1=1 >=1 

Mi 



We can therefore use Newton's method to solve for the coefficients {a/}'Li 
which zero the function 



/W = 



//i (a) \ / det(Ao - fijn + ELi ^kAk) \ 

/2(a) det(Ao - fi2ln + ELi '^kAk) 



\f„{cc) J V det(Ao - fijn + ELi «*^0 / 



because we can get the Jacobian elements [/],■• = 8/j(a)/8aj by setting ?. — aj, 



A — Ai and 



B =Ao- fijln + ^ y.kAk. 

k=\ 

This is consistent with aA + B =fi{ix). 

These expressions appear computationally costly but in many situations the Aj 
have rank one or perhaps two and this reduces the computational cost significantly. 
In addition, there is no need to compute eigenvectors at each stage, as is normally 
required [29]. 

A Newton method based on these results takes as input an affine set {A^J^^q, an 
initial estimate a'°' and a target spectrum S = {/^k}'k=i ^^id finds, if the process 
converges, the affine coefficients a such that (t(Ao + X]I'=i ^f-kAk) = S. We describe 
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the algorithm only for the step in which the Jacobian matrix is computed (see [17] 
for details). 

a. For each / = 1,2, . . .,n 

i. Compute det(H), H ^ Aq - M + ELi 4'"'^*:- 
ii. for each k ^ 1,2, . . .,n 

1. Compute B = A{) — fijl„ + X],"=i °^j^j- 

2. Use the QZ algorithm to get unitary Q,R, det(g) ~ det(/?) — 1 which 
simultaneously triangularize Aii,B 

3. Reorder {ai,bj) so that a,.+i = a,.+2 = ■ ■ ■ = a„ = Q. 

Remarks 

• Any factoring from which the determinant can be easily computed will suffice. 

• Matrix H changes only slightly between values of i so a factoring based on 
updates would save considerable effort. 

• The Jacobian calculation is well suited to parallel computation. 

• Quite good heuristics for starting values exist. 



12.7.2.1 Inverse Eigenvalue Problems for Higher Degree 
Matrix Polynomials 

The method of Sect. 12.7 is applicable to the solution of inverse eigenvalue 
problems with higher degree matrix polynomials. To illustrate the technique we 
show how it can be used on quadratic matrix polynomial problems. 
We define fi {a., P), i = 1,2,3, . . ., 2n by 



fi{oL, P) = det I i4m + ^ii Co + ^ atCk j + /iTo + J] kKk 1 ■ 

The 2« X 2n Jacobian for/- has two 2m x « blocks yfa*'"', ^<"''') = (/i J 2) 

where Ji has partial derivatives of/(a, P) with respect to (a) and J2 has the partial 
derivatives of/(a, ^) with respect to p. We replace the matrix H of ((a)/) by 



V k=\ J k=l 
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and the matrix B of step {a)ii.] 


. is replaced by 




( \ 

n \ n 


B = ^i]M + ii, 


Co+>>,-C,- 


+Ko+y]PjKj 




I ;;: ) 


7=1 



when computing Ji and by 



B = h]M + /i,. Co + ^ oLjCj 



when computing J 2- 



i=l 



Example 7.2 Construct a real symmetric, triple with prescribed real eigenvalues. 
This is an example of Problem 7.1 and has application in the passive vibration 
control of a mass-spring-damper system. 

The system is Af = diag{ 1,2,3}, 




Co = 



and the affine family elements 

1 -1 0' 

Ci =.firi = I -1 1 

0. 



Ko= \ -4- 11 
-7 



Co Ko 




I 0\ 
C3=K3= \ . 
\0 0/ 

The method finds many real and complex solutions. Taking six steps from the 
starting value 

a(''' = (0 10)^, ^(°' = (10 50 10)^. 

the method finds the real affine coefficients 

/ -0.700525 \ / 4.608802 \ 

a= 0.128550 , ^=70.160636 
V 14.046055/ V 8.151792 / 

which assign the eigenvalues —3 ± /, —3 ± 2/, —3 ± 3/. Starting instead from 
a(o) = (0 0)^, /J(°' = (50 10 10)^. 
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the method takes five steps to find a different solution 

-0.324894 \ / 53.234508 \ 

0.367479 , P^ I 13.732452 

13.164036/ V 6.945330/ 

which assigns the same eigenvalues. 

Example 7.3 Finding the damping matrix. This example is a special case of 
Problem 7.1 in which we are given the matrices M and K and we seek the damping 
matrix C. More precisely we want to solve 



Problem 7.2 Given a symmetric tridiagonal m x « matrix K and a set S = 
{f^k}k=i "f ^^If conjugate scalars such that det{K) = njir=i f^k^ Fi'^'^ ^ symmetric, 
tridiagonal C,n x n, such that a{X I + aC + K) is S. 

The necessity of det(K) = Y[k=i f^k fallows from substituting a — in 
det(/l / + XC + K) = Y[k=ii'^~ ~ A'yfc)- ^^^ problem generalizes easily to diagonal 
M replacing I. 

We use C — X]yi:=i '^kCk, with the affine family 

Ci = CicJ , i = 1,2, . . .,« 
Cn+i = (c; -ei+i){ei -Ci+if, i= 1,2, ...,«- 1, 

As before /(a) is defined by /;(a) = detf /i^/ + /i,- ^^^"^ a^Ck+K], 
i — 1, 2, . . ., 2« — 1 and the Jacobian is now 2m — 1 square 

[J\ir—^ , !, y = 1,2, ...,2n- 1, 

So,- 

with 

2n-l 2«-l 

H = ^jl + fii ^ akCk +K and B = fi^I + fi^ ^ «,€,■ + K. 
k=\ j=i 



Mk 



Example 



K 



C, = 
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The starting value a'°' = (1,1,1,1,1) , finds 

a = (0.722105, 2.002012, 0.209320, 1.936412, 2.132404)^ 

to an accuracy of 10^'^in eight steps. This solution a corresponds to the damping 
matrix 

2.658517 -1.936412 
-1.936412 6.070828 -2.132404 
-2.132404 2.341724^ 

which is such that a{}?I + XC + K) = {-Ay/l, -^2, -1 ± /, -1 ± 2/}. 



12.8 Symmetry Preserving Partial Pole Assignment 
for the Standard and the Generalized Eigenvalue 
Problems 

The problem of symmetry preserving pole assignment for the quadratic pencil has 
received much attention [6, 11, 12, 24, 25, 26, 34] but remains a considerable 
challenge. In this section we describe a pole assignment method for the general- 
ized inverse eigenvalue problem which has an explicit solution and which pre- 
serves symmetry. 

Consider a system modelled by the differential equation 

Bx'it) = Ax{t) + bu{t), x{t) = xq, 

with A,B e M"^", symmetric and b G M.". We seek a single input control u{t) — 
f^x{t) — g^x'{t) which is such that the closed loop system 

{B + bg^)x'{t) = {A+bf)x{t) 

has prescribed frequencies. This leads to the linear algebra problem of finding 
vectors /, g which assign part or all of the spectrum of the modified pencil 

{A+bf)~A{B + bg^). 

Now, many finite element models lead to real symmetric A,B but rank-one 
updates of the form bf^,bg^ destroy symmetry. Furthermore, sometimes (the 
control of vibratory systems by passive elements is an example) we need the 
closed-loop system to satisfy a reciprocity law: the force at xi due to a unit 
displacement at X2 should equal the force at x^ due to a unit displacement at x\. In 
such a case we need to find symmetric controls such as 

P,(i) = (A + OMU^) - ;.(B + liuu^) . (12.24) 



244 S. Elhay 

It is well known [29] that the eigenvalues li < ia < • ■ • I^Ku and eigenvectors of 
• the symmetric matrix A G M"^" are real and those of 



• 



the real, symmetric definite (B spd) pair A, B are real. 



In addition, the eigenvalues of A and those fi^ < fi2< • • • ^ A'«i of A + avv^, 
(T = ±1, interlace: 

''-j < /^; < ^j+i, if ff = 1 and /.(• < /.,• < /i;^i,if cr = — 1 j=l,2,...,n 

(/i„+i = fi„^i = oo). Similarly, the eigenvalues ii < /,2 < • • • < /".„ of A — ?.B and 
those /.(] < /^2 ^ ' ' ' 5; /^«> of A + mmm^ — ?.{B + fluu^) interlace [19] 

a/jS</ij</ti if a//J<Ai, 

1,- </.(•< 1,+ 1 if lj< a/ P, 

^j < /^; < a/jS < % 1 < /".;■+ 1 if ^v' < =</i5 < ''•;+ 1 ' 
^/ < Afj+i < ^1 if a/i3 < 1/, 

^n < /i« < «//^ if ^^n < «//^ 



^;=1,2,...,«. (12.25) 



Thus, only poles which interlace appropriately can be assigned. 
Lowner [28] solved the symmetry preserving partial pole assignment for the 
standard eigenvalue problem by showing that choosing 

"'--nHr^' ' = i'2,...,« (12.26) 

kjti 

where, m = Q^u, A = QAQ^, Q orthogonal, A — diag{/ti, I2, ■ • ■, ■^'•n}, assigns an 
appropriately interlacing set of scalars {/i,}"^[ to the spectrum of A + (jvv^. 

Similarly, there is an explicit formula [16] for the vector u which assigns part or 
all of the spectrum of A + auu^ — ?.{B + fiuu^). 

The method leaves unchanged the eigenvalues not to be replaced, again 
ensuring no spillover and only those i, which are to be replaced need to be known. 
Our problem is now stated as 

Problem 8.1 Given 

a. A,B E M"^", symmetric and B spd with the spectrum of A — IB labelled 

M<h<h< ■■■ <i„, 

b. scalars a > 0, /? > 0, a/p ^ Ij, Vy, and 

c. {/i^}!^[, r<n which satisfy the interlacing property (12.25) 

Find u eW such that the spectrum of A + auu^ , B + fluu^ is 

Our method uses the secular equation for a symmetric pair of matrices [19] 
which we briefly review here. 
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Suppose that Y simultaneously diagonalizes A and B with the scaling 
Y^AY = A = diag{;.i , ia, • • •, ^}, Y^BY = /. If x, fi is an eigenpair of (12.24) 
then (A + auu^)x = fi{B + Puu^)x and using the substitution u = Y^u and Yx — 
X we quickly get 



(A + auu^)x = fi{I + fiuu^)x. 



(12.27) 



Lemma 8.1 Suppose Ij are distinct and assume also ejii ^ and Aj ^ a/ [1 for 
all j. Then, (a) (pi — A) is invertible, (b) fi ^ a//?, ayid (c) x^u ^ 0. 

Multiplying (12.27) on the left by x^ and rearranging yields a secular equation 
for the symmetric pair (a — l^fi)u^{fjl — A)" u = I which is sometimes written 

componentwise as g{fi) = 1 — (a — fifi) X];'=i 717 = 0- The zeros of g are the 
eigenvalues of the pencil (12.24) and its poles are the eigenvalues of the pencil 
A — /IB. Figure 12.1 shows the example secular equation function 



g(A0 = l-(3A'-7) 



1/4 1/9 1/16 



I — fi 2 — fi 3 



(12.28) 



The poles, at 1,2, 3, are the eigenvalues of the original system and the zeros, 
approximately 2.9233, 2.0913, 1.4196 are the eigenvalues of the system after rank- 
one corrections are added to each of the matrices in the pencil. 

Using the secular equation we can devise a symmetry preserving full and partial 
pole assignment method for the symmetric definite matrix pair. 

Suppose first that we wish to perform a full pole assignment with {/i;}"^j 
appropriately defined. The equations g(/i,) = 0, y = 1,2, . . .,n can be assembled 
into a matrix equation 



Fig. 12.1 The Secular 
equation function g(/i) of 
(12.28) 
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/ 1 


I 


1 


A'l -'-2 
I 


1 


1 






1 

A'2-^" 



V«^/ 



1 



V 2-ft'„ ' / 



in which the Cauchy matrix, C, on the left, has expHcit inverse 



[c- 






from which we can quickly get 



»?=E7t: 



nLi(^^;-A't)nLi(/^y-^i) 






k^i k^j 






■(M-a)li(;,-A,)' 



, ^, . . ., / 



Thus, taking signs into account shows there are 2" different solutions. We note 
that computing the iij by this explicit formula requires 6n{n — l)x,-^ operations 
and is more economical than the {rr' + 6n^ + 2n)/3 operations required to solve 
the system numerically. There are, however, numerical considerations which may 
make the explicit formula undesirable for certain distributions of the 1, and /i . 

Partial pole assignment is now straightforward. Suppose that S — {1^,12, .. ., ir} 
is a subset of 1,2, .. .,n,r<n, and that T is the complement of S. Let a, fl, {/i,}", 
and {fij}" satisfy the interlacing property (12.25). We set m,- == for all i e T and 
compute 



«?- 



i^i - l^k) TT i.Ph - a) 



H(M 



TT 

-)\hih-h) 



ies. 



(12.29) 



kjti 



For each component of u which is zero, the secular equation has one fewer term 
and the Cauchy matrix has one fewer row and column. 

The numerical properties of this method will be reported elsewhere. While the 
tests conducted so far have not been exhaustive, it is clear that the method, even 
applied to problems with n = 1, 024 and with assigned eigenvalues placed very 
close to the open loop system eigenvalues, can return assigned eigenvalues which 
have lost no more than about four decimals of accuracy. However, the distribution 
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of the existing and assigned eigenvalues and their respective condition numbers in 
the problem will certainly play a important role in determining the accuracy of the 
method in a particular case. Based on the test results obtained so far we believe 
that the method will produce very accurate assignments for even quite large 
systems provided that the point distributions are not pathological. 



12.9 Conclusions 

Our interest in most of the problems discussed in this paper was motivated by the 
study of damped vibrating systems and the control of damped oscillatory systems. 
The pole assignment and inverse problems for the symmetric definite linear pencils 
have the very useful property that all the eigenpairs in the system are real. Thus, 
we have described the solution to an inverse problem for a particular symmetric 
definite pair and we have shown how to assign part or all of the spectrum of a 
symmetric definite pair while preserving the system's symmetry. 

The eigenpairs of real, symmetric definite quadratic pencils may contain com- 
plex elements. This fact considerably complicates the solution of inverse and pole 
assignment problems for these pencils. We have described a solution for the inverse 
problem of constructing a symmetric, tridiagonal, monic quadratic pencil with two 
spectral sets prescribed. We have described methods for pole assignment and partial 
eigenstructure assignment in symmetric definite quadratic pencils controlled by 
either single or multi-input controls. We have also described symmetry preserving 
pole assignment to symmetric definite quadratic pencils by affine methods. 

Further work needs to be done on the characterization of those eigendata which, 
when assigned, lead to symmetric definite quadratic pencils with the kind of 
properties that can be realized in a physical system. 
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Chapter 13 

Descent Methods for Nonnegative Matrix 

Factorization 

Ngoc-Diep Ho, Paul Van Dooren and Vincent D. Blondel 



Abstract In this paper, we present several descent methods that can be applied to 
nonnegative matrix factorization and we analyze a recently developed fast block 
coordinate method called Rank-one Residue Iteration (RRI). We also give a 
comparison of these different methods and show that the new block coordinate 
method has better properties in terms of approximation error and complexity. By 
interpreting this method as a rank-one approximation of the residue matrix, we 
prove that it converges and also extend it to the nonnegative tensor factorization 
and introduce some variants of the method by imposing some additional con- 
trollable constraints such as: sparsity, discreteness and smoothness. 



13.1 Introduction 

Linear algebra has become a key tool in almost all modern techniques for data 
analysis. Most of these techniques make use of linear subspaces represented by 
eigenvectors of a particular matrix. In this paper, we consider a set of n data points 
ai,a2, ■ . -jOn, where each point is a real vector of size m,a, € M™. We then 
approximate these data points by linear combinations of r basis vectors m, e M'": 
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This can be rewritten in matrix form as A « UV^, where a, and m,- are 
respectively the columns of A and U and the Vy's are the elements of V. Optimal 
solutions of this approximation in terms of the Euclidean (or Frobenius) norm can 
be obtained by the Singular Value Decomposition (SVD) [11]. 

In many cases, data points are constrained to a subset of W". For example, light 
intensities, concentrations of substances, absolute temperatures are, by their nat- 
ure, nonnegative (or even positive) and lie in the nonnegative orthant M™. The 
input matrix A then becomes elementwise nonnegative and it is then natural to 
constrain the basis vectors v/ and the coefficients Vy to be nonnegative as well. In 
order to satisfy this constraint, we need to approximate the columns of A by the 
following additive model: 



7=1 

where the Vy coefficients and Uj vectors are nonnegative, Vy G 1R+, m/ G K™. 

Many algorithms have been proposed to find such a representation, which is 
referred to as a Nonnegative Matrix Factorization (NMF). The earliest algorithms 
were introduced by Paatero [23, 24]. But the topic became quite popular with the 
publication of the algorithm of Lee and Seung in 1999 [18] where multiplicative 
rules were introduced to solve the problem. This algorithm is very simple and 
elegant but it lacks a complete convergence analysis. Other methods and variants 
can be found in [15, 19, 20]. 

The quality of the approximation is often measured by a distance. Two popular 
choices are the Euclidean (Frobenius) norm and the generalized Kullback-Leibler 
divergence. In this paper, we focus on the Euclidean distance and we investigate 
descent methods for this measure. One characteristic of descent methods is their 
monotonic decrease until they reach a stationary point. This point maybe located in 
the interior of the nonnegative orthant or on its boundary. In the second case, the 
constraints become active and may prohibit any further decrease of the distance 
measure. This is a key issue to be analyzed for any descent method. 

In this paper, M™ denotes the set of nonnegative real vectors (elementwise) and 
[v]^ the projection of the vector v on M™. We use v>0 and A>0 to denote 
nonnegative vectors and matrices and v > and A > to denote positive vectors 

[Al 

and matrices. Ao B and y are respectively the Hadamard (elementwise) product 
and quotient. A, and A, are the ith column and rth row of A. 

This paper is an extension of the internal report [14], where we proposed to 
decouple the problem based on rank one approximations to create a new algorithm 
called Rank-one Residue Iteration (RRI). During the revision of this report, we 
were informed that essentially the same algorithm was independently proposed 
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and published in [7] under the name Hierarchical Alternative Least Squares 
(HALS). But the present paper gives several additional results wherein the major 
contributions are the convergence proof of the method and its extensions to many 
practical situations and constraints. The paper also compares a selection of some 
recent descent methods from the literature and aims at providing a survey of such 
methods for nonnegative matrix factorizations. For that reason, we try to be self- 
contained and hence recall some well-known results. We also provide short proofs 
when useful for a better understanding of the rest of the paper. 

We first give a short introduction of low rank approximations, both uncon- 
strained and constrained. In Sect. 13.3 we discuss error bounds of various 
approximations and in Sect. 13.4 we give a number of descent methods for 
Nonnegative Matrix Factorizations. In Sect. 13.5 we describe the method based on 
successive rank one approximations. This method is then also extended to 
approximate higher order tensor and to take into account other constraints than 
nonnegativity. In Sect. 13.5 we discuss various regularization methods and in Sect. 
13.6, we present numerical experiments comparing the different methods. We end 
with some concluding remarks. 



13.2 Low-Rank Matrix Approximation 



Low-rank approximation is a special case of matrix nearness problem [12]. When 
only a rank constraint is imposed, the optimal approximation with respect to the 
Frobenius norm can be obtained from the Singular Value Decomposition. 

We first investigate the problem without the nonnegativity constraint on the 
low-rank approximation. This is useful for understanding properties of the 
approximation when the nonnegativity constraints are imposed but inactive. We 
begin with the well-known Eckart- Young Theorem. 



Theorem 2.1 (Eckart- Young) Let A € 
decomposition 



\m>n) have the singular value 



A = PHQ^, S = 



/ffi 

(72 







V 



0\ 




0/ 



where ffi > cr2 > • • • > «■« > are the singular values of A and where P e 
and Q € M"^" are orthogonal matrices. Then for I < r <n, the matrix 
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A, = P^rQ\ ^r 





ff2 







V 






(7, 



0\ 





0/ 



w a global minimizer of the problem 



min -\\A 

^" rank{B) < r 2 



■511 



(13.1) 



and its error is 



B\ 



1 " 



2. 



/=r+l 



Moreover, if a,- > <Ji.+\ then A^ is the unique global minimizer. 

The proof and other implications can be found for instance in [11]. The columns 
of P and Q are called singular vectors of A, in which vectors corresponding to the 
largest singular values are referred to as the dominant singular vectors. 

Let us now look at the following modified problem 



xeR"" 



-\\A 



XY 



Tu2 



(13.2) 



where the rank constraint is implicit in the product XY^ since the dimensions of X 
and Y guarantee that rank(XF^) < r. Conversely, every matrix of rank less than r 
can be trivially rewritten as a product XY^, where X e M"""' and F € R"'"'. 
Therefore Problems (13.1) and (13.2) are equivalent. But even when the product 
A,. = XY^ is unique, the pairs {XR^ ,YR^^) with R invertible, yield the same 
product XY^. In order to avoid this, we can always choose X and Y such that 

X = PD^ and Y ^ QD^, (13.3) 

where P^P = /,xr, Q^Q = //xr and D is r x r nonnegative diagonal matrix. Doing 
this is equivalent to computing a compact SVD decomposition of the product 
A, = XY'^ = PDQ^. 

As usual for optimization problems, we calculate the gradient with respect to X 
and Y and set them equal to 0. 

\Jx^XY'^Y -AY ^Q Vy=YX^X-A^X = Q. (13.4) 

If we then premultiply A^ with Vx and A with Vy, we obtain 

{A^A)Y ={A^X)Y'^Y {AA^)X={AY)X^X. (13.5) 
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Replacing A^X = YX^X and AY = XY^Y into (13.5) yields 

{A^A)Y^YX^XY^Y {AA^)X = XY'^YX^X. (13.6) 

Replacing (13.3) into (13.6) yields 

{A^A)QD^ = QDP^PDQ^QD^ and {AA^)PD^ = PDQ^QDP^PDi 

When D is invertible, this finally yields 

(A^A)2 = QD^ and {AA'^)P ^ PD-. 

This shows that the columns of P and Q are singular vectors and Z)„'s are 
nonzero singular values of A. Notice that if D is singular, one can throw away the 
corresponding columns of P and Q and reduce it to a smaller-rank approximation 
with the same properties. Without loss of generality, we therefore can focus on 
approximations of Problem (13.2) which are of exact rank r. We can summarize 
the above reasoning in the following theorem. 

Theorem 2.2 Let A e R'"'^"{m > n and rank(A) = t). 7/'A,.(l <r<t) is a rank r 
stationary point of Problem (13.2), then there exists two orthogonal matrices 



Pe 



where 



and Q € 



^such that: 



PI.Q' and A,- ^ PI.,.Q 







1 = 





ff2 











V 







^r = 



0/ 





^2 







V 






CT,. 







o/ 



and the ff; 's are unsorted singular values of A. Moreover, the approximation 



\\\A~A^\l = \±al 



This result shows that, if the singular values are all different, there are .,"• s, 

possible stationary points A,-. When there are multiple singular values, there will 
be infinitely many stationary points A^ since there are infinitely many singular 
subspaces. The next result will identify the minima among all stationary points. 
Other stationary points are saddle points whose every neighborhood contains both 
smaller and higher points. 
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Theorem 2.3 The only minima of Problem (13.2) are given by Theorem 2.1 and 
are global minima. All other stationary points are saddle points. 

Proof Let us assume that Ay is a stationary point given by Theorem 2.2 but not by 
Theorem 2. 1 . Then there always exists a permutation of the columns of P and Q, 
and of the diagonal elements of E and Z,- such that ff,+i > ff,-. We then construct 
two points in the e-neighborhood of A,, that yield an increase and a decrease, 
respectively, of the distance measure. They are obtained by taking: 
fai+e ... ... 0\ 



and 



S,(e) = 







V 



S.(e) 



/^i 








\ 



Or 










0/ 



A,(e) = PS,(e)e' 






o/ 



, A,(e) = ra,(e)Q^. 



Clearly A,-(e) and A,.(e) are of rank r. Evaluating the distance measure yields 

||A-A,(e)||^=2ff,e2 + (^^^j_,2)2^ ^ ^2 

i=r+2 
i=r+l 

< ^&2^||A-AJ^ 
/■=/•+ 1 

for all e e (0, ■y/2pv+7^-^>)) ^nd 

||A-A,(.)||^ = 62+^frf> ^^2^||A-A,|1^ 

/=r+l i=r+l 

for all e > 0. Hence, for an arbitrarily small positive e, we obtain 

||A-A,(6)||^<||A-A,||^<||A-A,(e)||^ 



which shows that A^ is a saddle point of the distance measure. 



D 
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When we add a nonnegativity constraint in the next section, the results of this 
section will help to identify stationary points at which all the nonnegativity con- 
straints are inactive. 



13.3 Nonnegativity Constraint 

In this section, we investigate the problem of Nonnegative Matrix Factorization. 
This problem differs Problem (13.2) in the previous section because of the addi- 
tional nonnegativity constraints on the factors. We first discuss the effects of 
adding such a constraint. By doing so, the problem is no longer easy because of the 
existence of local minima at the boundary of the nonnegative orthant. Determining 
the lowest minimum among these minima is far from trivial. On the other hand, a 
minimum that coincides with a minimum of the unconstrained problem (i.e. 
Problem (13.2)) may be easily reached by standard descent methods, as we will 
see. 

Problem 1 (Nonnegative matrix factorization — NMF) Given am x n nonnegative 
matrix A and an integer r < imn{m, n), solve 



1,, 
min - A - UV 

' veR"^' 2 



T\\2 



where r is called the reduced rank. From now on, m and n will be used to denote 
the size of the target matrix A and r is the reduced rank of a factorization. 

We rewrite the nonnegative matrix factorization as a standard nonlinear opti- 
mization problem: 



min -I I A - UV 



-U<Q -V<02 

The associated Lagrangian function is 



^'1^ 



Liu, V, fi, v) = ^||A - UV^Wl ~^ioU~voV, 

where /i and v are two matrices of the same size of U and V, respectively, 
containing the Lagrange multipliers associated with the nonnegativity constraints 
Uij > and Vjj > 0. Then the Karush-Kuhn-Tucker conditions for the nonnegative 
matrix factorization problem say that if ( U, V) is a local minimum, then there exist 
fijj > and Vij > such that: 

f/>0, y>0, (13.7) 
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VLf/ = 0, 


VLv = 0, 


fioU = 0, 


voV = 0. 


Developing (13.8) we have: 




AV - UV^V - ^ = 0, 


A'^U~VU^U~v = 


or 




f^= -(^UV^V -AV), 


v^-{VU^U~A^U). 



(13.8) 
(13.9) 



Combining this with j.iij > 0, 1'y > and (13.9) gives the following conditions: 

f/>0, y>0, (13.10) 

yFu = UV'^V~AV>0, VFv = yu'^U ~A^U>0, (13.11) 

Uo{UV^V -AV)^0, Vo{VU^U -A^U)=Q, (13.12) 

where the corresponding Lagrange multipliers for U and V are also the gradient of 
F with respect to U and V. Since the Euclidean distance is not convex with respect 
to both variables U and V at the same time, these conditions are only necessary. 
This is implied because of the existence of saddle points and maxima. We then call 
all the points that satisfy the above conditions, the stationary points. 

Definition 1 (NMF stationary point) We call {U,V) a stationary point of the 
NMF Problem if and only if U and V satisfy the KKT conditions (13.10), (13.11) 
and (13.12). 

Alternatively, a stationary point (t/, V) of the NMF problem can also be defined 
by using the following necessary condition (see for example [4]) on the convex 

sets R';'^'' and Rf, that is 

(^^^^),(^^~J^)^>o, vxeMr^^eMr, (13.13) 

which can be shown to be equivalent to the KKT conditions (13.10), (13.11) and 
(13.12). Indeed, it is trivial that the KKT conditions imply (13.13). And by 
carefully choosing different values of Z and Y from (13.13), one can easily prove 
that the KKT conditions hold. 

There are two values of reduced rank r for which we can trivially identify the 
global solution which are r = 1 and r = min(m, n). For r = 1, a pair of dominant 
singular vectors are a global minimizer. And for r — min(m, n),{U — A,V = I) is 
a global minimizer. Since most of existing methods for the nonnegative matrix 
factorization are descent algorithms, we should pay attention to all local mini- 
mizers. For the rank-one case, they can easily be characterized. 
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13.3.1 Rank One Case 

The rank-one NMF problem of a nonnegative matrix A can be rewritten as 

min -\\A-uv^\\l (13.14) 

«eR" veR'; 2 

and a complete analysis can be carried out. It is well known that any pair of 
nonnegative Perron vectors of AA^ and A^A yields a global minimizer of this 
problem, but we can also show that the only stationary points of (13.14) are given 
by such vectors. The following theorem excludes the case where u — and/or 

Theorem 3.1 The pair (m, v) is a local minimizer o/ (13.14) if and only if u and v 
are nonnegative eigenvectors of AA^ and A^A respectively of the eigenvalue 



a = 



Proof The if part easily follows from Theorem 2.2. For the only if part we 
proceed as follows. Without loss of generality, we can permute the rows and 
columns of A such that the corresponding vectors u and v are partitioned as 
(m+ 0) and (v+ 0) respectively, where m+,v+ > 0. Partition the corresponding 
matrix A conformably as follows 



All An 

All An 



then from (13.11) we have 



and 



M+v; OWv+\ /All A12WV+ 

oM i U21 A22 A 



v+uf_ ^\ ( u^\ ( A\x A21 a / M4 



y V y V A[2 AI2 y V 



>o 



>o 



implying that A21 v+ < and AJ2U+ < 0. Since A21 , A12 > and m+, v+ > 0, we can 
conclude that A12 — and A21 = 0. Then from (13.12) we have: 

M+ o (IIV+II2M+ — A11V+) = and v+ o (||m^||2v+ — A|jM+) = 0. 
Since m+,v+ > 0, we have: 

IIV+II2M+ = A11V+ and IIM+II2V+ = AuM+ 
or 

Il«+ll2ll'^+ll2"+ =^ii^n"+ ^"d IIM+II2IIV+II2V+ = A[iAiiv+. 
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Setting a ~ llM+Halk+lla ^^^ using the block diagonal structure of A yields tiie 
desired result. D 

Theorem 3.1 guarantees that all stationary points of the rank-one case are 
nonnegative singular vectors of a submatrix of A. These results imply that a global 
minimizer of the rank-one NMF can be calculated correctly based on the largest 
singular value and corresponding singular vectors of the matrix A. 

For ranks other than 1 and min(m,n), there are no longer trivial stationary 
points. In the next section, we try to derive some simple characteristics of the local 
minima of the nonnegative matrix factorization. 

The KKT conditions (13.12) help to characterize the stationary points of the 
NMF problem. Summing up all the elements of one of the conditions (13.12), we 
get: 

o=^(c/o([/y^y-Ay)).. 
==(f/y^,f/y^-A). (13.15) 

From that, we have some simple characteristics of the NMF solutions: 

Theorem 3.2 Let ( U , V) be a stationary point of the NMF problem, then 
UV^ G B{j,j\\A\\p-^, the ball centered at j and with radius = ^\\A\\p. 



A A 

2' 2 



Proof From (13.15) it immediately follows that 

\2 2 

which implies 

T M 1 



Theorem 3.3 Let {U, V) be a stationary of the NMF problem, then 

h,\A-UV^\\l = \{\\Af,-\\UV^\\l). 

Proof From (13.15), we have {W^ ,A) = (C/V^, UV'^). Therefore, 

i(A - UV\A - UV^) =\{\\Afp - 2{UV\A) + \\UV^\\l) 
= \{\\A\\l~\\UV^\\l). 



U 



D 
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Theorem 3.3 also suggests that at a stationary point {U, V) of the NMF prob- 
lem, we should have ||A||^ > ||f/y-'^||^. This norm inequality can be also found in 
[6] for less general cases where we have V/^(/ = and VFy = at a stationary 
point. For this particular class of NMF stationary point, all the nonnegativity 
constraints on U and Vare inactive. And all such stationary points are also sta- 
tionary points of the unconstrained problem, characterized by Theorem 2.2. 

We have seen in Theorem 2.2 that, for the unconstrained least-square problem 
the only stable stationary points are in fact global minima. Therefore, if the sta- 
tionary points of the constrained problem are inside the nonnegative orthant (i.e. 
all constraints are inactive), we can then probably reach the global minimum of the 
NMF problem. This can be expected because the constraints may no longer pro- 
hibit the descent of the update. 

Let A, be the optimal rank-r approximation of a nonnegative matrix A, which 
we obtain from the singular value decomposition, as indicated in Theorem 2.2. 
Then we can easily construct its nonnegative part [A^.]^, which is obtained from A,- 
by just setting all its negative elements equal to zero. This is in fact the closest 
matrix in the cone of nonnegative matrices to the matrix A,-, in the Frobenius norm 
(in that sense, it is its projection on that cone). We now derive some bounds for the 
error ||A - [A,.]^||^. 

Theorem 3.4 Let Ar be the best rank r approximation of a nonnegative matrix A, 
and let [Ar] , be its nonnegative part, then 



[A.]+||^<||A-A,| 



F- 



Proof This follows easily from the convexity of the cone of nonnegative matrices. 
Since both A and [A,]^ are nonnegative and since [A,]^ is the closest matrix in that 
cone to Ar we immediately obtain the inequality 

\\A-AA\l >\\A- [A,]4l + ||A, - [A,]4l > ||A - [A,.]+||^ 

from which the result readily follows. D 

The approximation [A,]^ has the merit of requiring as much storage as a rank r 
approximation, even though its rank is larger than r whenever A^ ^ [Ar]^. We will 
look at the quality of this approximation in Sect. 13.6. If we now compare this 
bound with the nonnegative approximations then we obtain the following 
inequalities. Let f/^Vj be an optimal nonnegative rank r approximation of A and 
let UV^ be any stationary point of the KKT conditions for a nonnegative rank r 
approximation, then we have: 



||A - [A,]4l <\\A~ A,\\l - ^ ^2 < 11^ _ ^^^r||2 < 11^ _ ^y 

/=r+l 

For more implications of the NMF problem, see [13]. 
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13.4 Existing Descent Algorithms 

We focus on descent algorithms that guarantee a non increasing update at each 
iteration. Based on the search space, we have two categories: Full-space search 
and (Block) Coordinate search. 

Algorithms in the former category try to find updates for both U and V at the 
same time. This requires a search for a descent direction in the {m + «)r-dimen- 
sional space. Note also that the NMF problem in this full space is not convex but 
the optimality conditions may be easier to achieve. 

Algorithms in the latter category, on the other hand, find updates for each 
(block) coordinate in order to guarantee the descent of the objective function. 
Usually, search subspaces are chosen to make the objective function convex so 
that efficient methods can be applied. Such a simplification might lead to the loss 
of some convergence properties. Most of the algorithms use the following column 
partitioning: 

l|jA - UV^Wl = \Y. \\A,, - U{V,.f\\l (13.16) 

which shows that one can minimize with respect to each of the rows of V inde- 
pendently. The problem thus decouples into smaller convex problems. This leads 
to the solution of quadratic problems of the form 

1 9 

min-llfl- t/v||,. (13.17) 

y>02 ^ 

Updates for the rows of V are then alternated with updates for the rows of U in 
a similar manner by transposing A and UV^ . 

Independent on the search space, most of algorithms use the Projected Gradient 
scheme for which three basic steps are carried out in each iteration: 



• 



Calculating the gradient Vf (x*). 

Choosing the step size a*. 

Projecting the update on the nonnegative orthant 

/+' = [x*-a*V/='(x*)]+, 

where x* is the variable in the selected search space. The last two steps can be 
merged in one iterative process and must guarantee a sufficient decrease of the 
objective function as well as the nonnegativity of the new point. 
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13.4.1 Multiplicative Rules (Mult) 

Multiplicative rules were introduced in [18]. The algorithm applies a block 
coordinate type search and uses the above column partition to formulate the 
updates. A special feature of this method is that the step size is calculated for each 
element of the vector. For the elementary problem (13.17) it is given by 



v'+' =v*-a*oVF(v*+') = v'' 



[C/^t/v*] 



where [a*],. = ,^/^^, . Applying this to all rows of V and U gives the updating rule 
of Algorithm 1 to compute 

{U*,V*) = argmin \\A - UV^fp. 

U>0 V>0 



Algorithm 1 (Mult) 

1: Initialize U°, V° and fc = 
2: repeat 

rfc, 



3: t/*^+i = U'' 



AV'' 






y*+i = K* . [^"^^"^] 



[VAfc(£/t+i)r((7fc+i)] 

5: k = k+1 

6: until Stopping condition 



These updates guarantee automatically the nonnegativity of the factors but may 
fail to give a sufficient decrease of the objective function. It may also get stuck in a 
non-stationary point and hence suffer from a poor convergence. Variants can be 
found in [20, 22]. 



13.4.2 Line Search Using Armijo Criterion (Line) 

In order to ensure a sufficient descent, the following projected gradient scheme 
with Armijo criterion [19, 21] can be applied to minimize 

X* = argmin F(j«:) . 

A" 

Algorithm 2 needs two parameters a and jS that may affect its convergence. It 
requires only the gradient information, and is applied in [19] for two different 
strategies: for the whole space {U ,V) (Algorithm FLine) and for U and V 
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separately in an alternating fashion (Algorithm CLine). With a good choice of 
parameters (a = 0.01 and /? = 0.1) and a good strategy of alternating between 
variables, it was reported in [19] to be the faster than the multiplicative rules. 

Algorithm 2 (Line) 



Initialize i", a, /3, ao = 1 and k = 1 
repeat 

Ck = Ok-l 

y=[x''-akVF{x>=)] + 

if F(y) - F{x^) > a (VF(a;'=), y - x^) then 
repeat 

«fc = Oik- P 

y = \x^- afcVF(a;*)] + 
until F{y) - F(x^) < a {VF{x''), y - x*^) 
else 
repeat 
lasty = y 

Otk = Oik/P 

y=[x''- akVF(x>')]+ 
until F{y) - F[x^) > a {VF{x''), y - x'=) 
y = lasty 
end if 

J-fc+l = y 

k = k + l 
until Stopping condition 



13.4.3 Projected Gradient with First-Order Approximation (FO) 

In order to find the solution to 

X* = argminF(x) 

A' 

we can also approximate at each iteration the function F{X) using: 

F{x) = Fix') + (V,F(/),x - X*) + ^||x* - x\\l 

where L is a Lipshitz constant satisfying F{x) < F{x) , Vx. Because of this 
inequality, the solution of the following problem 

Xk+i — argmin^(x) 

also is a point of descent for the function F{x) since 

F{xk+i) < F{xk+i) < F{xk) = F{xk). 
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Since the constant L is not known a priori, an inner loop is needed. Algorithm 3 
presents an iterative way to carry out this scheme. As in the previous algorithm this 
also requires only the gradient information and can therefore can be applied to two 
different strategies: to the whole space ([/, V) (Algorithm FFO) and to U and V 
separately in an alternating fashion (Algorithm CFO). 

A main difference with the previous algorithm is its stopping criterion for the 
inner loop. This algorithm requires also a parameter /? for which the practical 
choice is 2. 

13.4.4 Alternative Least Squares Methods 

The first algorithm proposed for solving the nonnegative matrix factorization was 
the alternative least squares method [24] . It is known that, fixing either U or V , the 
problem becomes a least squares problem with nonnegativity constraint. 

Since the least squares problems in Algorithm 4 can be perfectly decoupled into 
smaller problems corresponding to the columns or rows of A, we can directly apply 
methods for the Nonnegative Least Square problem to each of the small problem. 
Methods that can be applied are [5, 17], etc. 

Algorithm 3 (FO) 



Initialize x^ , Lq and k = 
repeat 

y = [,t'= - ^VF(x'=)] + 

while F(y) - F{x*') > {VF{x''),y - x'') + ^\\y - x''\\l do 

Lk = Lk/P 

Y = [a-fc - ^VF(a.'=)]+ 
end while 

Lk+1 = Lk ■ P 
k = k+1 
until Stopping condition 



Algorithm 4 Alternative Least Square (ALS) 



Initialize U and V 
repeat 

Solve: niini/>o |||A - UV'^Wj, 
Solve: minj/>o ip^ - VU'^fp 
until Stopping condition 
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13.4.5 Implementation 

The most time-consuming job is the test for the sufficient decrease, which is also 
the stopping condition for the inner loop. As mentioned at the beginning of the 
section, the above methods can be carried out using two different strategies: full 
space search or coordinate search. In some cases, it is required to evaluate 
repeatedly the function F{ U, V) . We mention here how to do this efficiently with 
the coordinate search. 

Full space search: The exact evaluation of F(x) = F{U, V) = \\A — UV'^^Wp- 
need 0{mnr) operations. When there is a correction y — {U + At/, V + AV), we 
have to calculate F{y) which also requires 0{mnr) operations. Hence, it requires 
0{tmnr) operations to determine a stepsize in t iterations of the inner loop. 

Coordinate search: when V is fixed, the Euclidean distance is a quadratic 
function on U: 

F{U) = \\A - UV'^Wl = {A,A) - 2(C/y^,A) + (t/V^, C/V^) 

^ \\A\\l - 2{u,Av) + {u, t/(y^y)). 

The most expensive step is the computation of AV, which requires 0{mnr) 
operations. But when V is fixed, AV can be calculated once at the beginning of the 
inner loop. The remaining computations are {U,AV) and {U,U{V^V)), which 
requires 0{nr) and 0{nr^ + nr) operations. Therefore, it requires 0{tnr^) opera- 
tions to determine a stepsize in t iterations of the inner loop which is much less 
than 0{tmnr) operations. This is due to the assumption r -^ n. Similarly, when U 
fixed, 0{tmr^) operations are needed to determine a stepsize. 

If we consider an iteration is a sweep, i.e. once all the variables are updated, the 
following table summarizes the complexity of each sweep of the described algo- 
rithms: 



Algorithm Complexity per iteration 

Mult 0{mnr) 

FLine O(tmnr) 

CLine 0{tinr^ + t^mr^) 

FFO 0{tmnr) 

CFO 0{tinr^ + titnr^) 

ALS Oil'mnr)* 

lALS 0{mnr) 



where t, ti and t2 are the number of iterations of inner loops, which can not be 
bounded in general. For algorithm ALS, the complexity is reported for the case 
where the active set method [17] is used. Although 0{2''mnr) is a very high 
theorical upper bound that count all the possible subsets of r variables of each 
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subproblem, in practice, the active set method needs much less iterations to 
converge. One might as well use more efficient convex optimization tools to solve 
the subproblems instead of the active set method. 



13.4.6 Scaling and Stopping Criterion 



For descent methods, several stopping conditions are used in the literature. We 
now discuss some problems when implementing these conditions for NMF. 

The very first condition is the decrease of the objective function. The algorithm 
should stop when it fails to make the objective function decrease with a certain 
amount: 

F{u''+\V''+^)-F{U'',V>') 



f(f/*+',y*+')-F(f/*,y*)<e 



F([/^ y*) 



-<e. 



This is not a good choice for all cases since the algorithm may stop at a point 
very far from a stationary point. Time and iteration bounds can also be imposed for 
very slowly converging algorithms. But here again this may not be good for the 
optimality conditions. A better choice is probably the norm of the projected gra- 
dient as suggested in [19]. For the NMF problem it is defined as follows: 



[VS, 



[Va 



if Xij > 



min(0, [Vz],.,.) if Xy = 



where X stands for U or V. The proposed condition then becomes 






<e 



Vf/1 
Vi/i 



(13.18) 



We should also take into account the scaling invariance between U and V. 
Putting U = yU and V — -V does not change the approximation UV^ but the 
above projected gradient norm is affected: 



% 

v^.. 



^ 









\%\ 






ivt 



(3.19) 



Two approximate factorizations UV^ — UV^ resulting in the same approxi- 
mation should be considered equivalent in terms of precision. One could choose 



,2 



Vt/llf/ll^vllF' which minimizes (13.19) and forces ||V|^||^ = ||V^||f., but 
this may not be a good choice when only one of the gradients |j V|^||^ and 
is nearly zero. 



V^llf 
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In fact, the gradient ( _ 1 is scale dependent in the NMF problem and any 

stopping criterion that uses gradient information is affected by this scaling. To 
limit that effect, we suggest the following scaling after each iteration: 

where Df. is a positive diagonal matrix: 

m, - 

This ensures that ||f/:,|jf — \\V;i\\p and hopefully reduces also the difference 
between ||V''||f and ||V^||^. Moreover, it may help to avoid some numerically 
unstable situations. 

The same scaling should be applied to the initial point as well ( C/i , Vi ) when 
using (13.18) as the stopping condition. 



13.5 Rank-One Residue Iteration 

In the previous section, we have seen that it is very appealing to decouple the 
problem into convex subproblems. But this may "converge" to solutions that are 
far from the global minimizers of the problem. 

In this section, we analyze a different decoupling of the problem based on rank 
one approximations. This also allows us to formulate a very simple basic sub- 
problem. This scheme has a major advantage over other methods: the subproblems 
can be optimally solved in closed form. Therefore it can be proved to have a strong 
convergence results through its damped version and it can be extended to more 
general types of factorizations such as for nonnegative tensors and to some 
practical constraints such as sparsity and smoothness. Moreover, the experiments 
in Sect. 13.6 suggest that this method outperforms the other ones in most cases. 
During the completion of the revised version of this report, we were informed that 
an independent report [9] had also proposed this decoupling without any con- 
vergence investigation and extentions. 



13.5.1 New Partition of Variables 

Let the m,'s and v;'s be respectively the columns of U and V. Then the NMF 
problem can be rewritten as follows: 

Problem 2 (Nonnegative Matrix Factorization) Given a m x n nonnegative 
matrix A, solve 
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2 



1 

mm - 

Mi > F, > 2 



A - ^ M,V 



T 
uiv 

1=1 



Let us fix all the variables, except for a single vector v, and consider the 
following least squares problem: 

minJ|l7?,-M,v^||^, (13.20) 

where R, — A — X]i=^/ ^I'^J ■ We have: 

\\R, - u,v^\\l = trace[{R, - u,v^f{R, - u^)] (13.21) 

\\R, - u.v'^Wl = \\R,\\l - 2v^Rju, + \\u,\\l\\v\\l. (13.22) 

From this formulation, one now derives the following lemma. 
Lemma 5.1 // [Rju,], ^ 0, then v, :— ' '[^ is the unique global minimizer of 
(13.20) and the function value equals \\Rt\\p — ' '2 ■ 
Proof Let us permute the elements of the vectors x :— Rju, and v such that 



Px={^^^y Pv=f|^'), with jci>0, X2<0 
and P is the permutation matrix. Then 

\\Rt - Utv'^WF = II^'IIf - 2v[xi - 2V2X2 + ||m,||2(v[vi + V2V2). 

Since X2 < and V2 > 0, it is obvious that \\R, — u,v^\\p can only be minimal 
if V2 = 0. Our assumption implies that xi is nonempty and xi > 0. Moreover 
[RJu,]^ 7^ and u,>0 imply ||m/||2 > 0, one can then find the optimal vi by 
minimizing the remaining quadratic function 

\\R,\\~f - 2v[xi + ||m,||2v[vi 

which yields the solution vi — jp^p. Putting the two components together yields the 
result 

[Rju,]^ . IIP r||2 ||p||2 H[<«r] + H2 

v*= „ „2 and \\R,-u,vJp=\\R,\\p „ ,,2 ■ 

Il«'ll2 Il"'ll2 

D 
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Algorithm 5 (RRI) 



Initialize Uj's, Vi's, for i = 1 to r 
repeat 

for f = 1 to r do 



Rt = A-j:, 



^t 



UiV 



T 



if [Rjut]+ + then 

\Wt\\l 
else 

end if 

if {Rm\+ 7^ then 

^ {Rtvt\ + 

\\«t\\\ 
else 

uj = 
end if 
end for 
until Stopping condition 



Remark 1 The above lemma has of course a dual form, where one fixes v, but solves 
for the optimal u to minimize ||^f — uvjWp. This would yield the updating rules 



Vt^ ,, ,,2 and M, ^— -^ (13.23) 

Il"'ll2 PtWl 

which can be used to recursively update approximations X]!=i "'^f ^Y modifying 
each rank-one matrix u,v^ in a cyclic manner. This problem is different from the 
NMF, since the error matrices R, — A — '^2,1^, UjvJ are no longer nonnegative. We 
will therefore call this method the Rank-one Residue Iteration (RRI), i.e. Algo- 
rithm 5. The same algorithm was independently reported as Hierarchical Alter- 
nating Least Squares (HALS) [7]. 

Remark 2 In case where [/?^m,]^ — 0, we have a trivial solution for v = that is 
not covered by Lemma 5.1. In addition, if u, = 0, this solution is no longer unique. 
In fact, V can be arbitrarily taken to construct a rank-deficient approximation. The 
effect of this on the convergence of the algorithm will be discussed further in the 
next section. 

Remark 3 Notice that the optimality of Lemma 5.1 implies that \\A — UV'^\\ can 
not increase. And since A > fixed, UV^ > must be bounded. Therefore, its 
component m,v|(/ ~ 1. . .r) must be bounded as well. One can moreover scale the 
vector pairs (m,, v,) at each stage as explained in Sect. 13.4 without affecting the 
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local optimality of Lemma 5.1. It then follows that the rank one products uivf and 
their scaled vectors remain bounded. 



13.5.2 Convergence 



In the previous section, we have established the partial updates for each of the 
variable m,- or v,. And for a NMF problem where the reduced rank is r, we have in 
total 2r vector variables (the m,'s and v,'s). The described algorithm can be also 
considered as a projected gradient method since the update (13.23) can be 
rewritten as: 



Ut 



[R,v,]+ _ U ~ Y.i+, ".■vr)v,]+ _ [(A ~ E, M,-vf + u,v])v^. 



|V,||2 



l|v,||^ 

llv.ll^ 



Similarly, the update for v, can be rewritten as 



'''112 



-^u, 



"'112 



V/ 



V„ 



Therefore, the new method follows the projected gradient scheme described in 
the previous section. But it produces the optimal solution in closed form. For each 
update of a column v, (or u,), the proposed algorithm requires just a matrix-vector 
multiplication 7?,^m, (or /?,v,), wherein the residue matrix R,=A~ X^i^r "'^f '^•'^^ 
not have to be calculated explicitly. Indeed, by calculating /?,^m, (or R,v,) from 
A^Ut (or Av,) and ^Ylii^t ^'("f"') (o'" X^iV' "((^f^'))' the complexity is reduced from 
Oirnnr + mn) to only 0(mn + (m + M)(r — 1)) which is majored by 0(mn). This 
implies that the complexity of each sweep through the 2r variables m/s and v/s 
requires only 0{mnr) operations, which is equivalent to a sweep of the multipli- 
cative rules and to an inner loop of any gradient methods. This is very low since 
the evaluation of the whole gradient requires already the same complexity. 

Because at each step of the 2r basic steps of Algorithm 5, we compute an 
optimal rank-one nonnegative correction to the corresponding error matrix R, the 
Frobenius norm of the error can not increase. This is a reassuring property but it 
does not imply convergence of the algorithm. 

Each vector u, or v, lies in a convex set U, C K.'" or V, C M'|. Moreover, 
because of the possibility to include scaling we can set an upper bound for \\U\\ 
and ||y||, in such a way that all the V, and V/ sets can be considered as closed 
convex. Then, we can use the following Theorem 5.1, to prove a stronger con- 
vergence result for Algorithm 5. 
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Theorem 5.1 Every limit point generated by Algorithm 1 is a stationary point. 

Proof We notice that, if m, = and v, = at some stages of Algorithm 5, they 
will remain zero and no longer take part in all subsequent iterations. We can divide 
the execution of Algorithm 1 into two phases. 

During the first phase, some of the pairs (m,, v,) become zero. Because there are 
only a finite number (2r) of such vectors, the number of iterations in this phase is 
also finite. At the end of this phase, we can rearrange and partition the matrices U 
and V such that 

t/=(f/+0) and y=(y+0), 

where U+ and V+ do not have any zero column. We temporarily remove zero 
columns out of the approximation. 

During the second phase, no column of U+ and V+ becomes zero, which 
guarantees the updates for the columns of U+ and V+ are unique and optimal. 
Moreover, jIJA — X)/=i "'^fllf i^ continuously dififerentiable over the set 
Ui X • • • X Ur X Vi X • • • X V,., and the U,'s and V,'s are closed convex. A direct 
application of Proposition 2.7.1 in [4] proves that every stationary point (t/^, V*^) 
is a stationary point. It is then easy to prove that if there are zero columns removed 
at the end of the first phase, adding them back yields another stationary point: 
U* — {U*j_ 0) and V* — {V*j_ 0) of the required dimension. However, in this case, 
the rank of the approximation will then be lower than the requested dimension r. 

U 

In Algorithm 5, variables are updated in this order: mi,vi,M2jV2, . . .. We can 
alternate the variables in a different order as well, for example mi, M2, . . ., Mr^i, 
V2, . . ., V,., . . .. Whenever this is carried out in a cyclic fashion, the Theorem 5.1 
still holds and this does not increase the complexity of each iteration of the 
algorithm. 

As pointed above, stationary points given by Algorithm 5 may contain useless 
zero components. To improve this, one could replace UivJ{= 0) by any nonneg- 
ative rank-one approximation that reduces the norm of the error matrix. For 
example, the substitution 

M, = e,-. v,= [rJu,]^, (13.24) 

where ;* — argmax,||[/?,^e/]^||2, reduces the error norm by ||[^fe;]^||2 > unless 
R, < 0. These substitutions can be done as soon as u, and v, start to be zero. If we 
do these substitutions in only a. finite number of times before the algorithm starts to 
converge. Theorem 5.1 still holds. In practice, only a few such substitutions in 
total are usually needed by the algorithm to converge to a stationary point without 
any zero component. Note that the matrix rank of the approximation might not be 
r, even when all m/s and v,'s (t = I. . .r) are nonzero. 

A possibly better way to fix the problem due to zero components is to use the 
following damped RRI algorithm in which we introduce new 2r dummy variables 
w, G U, and z, e V,, where i= \. . .r. The new problem to solve is: 



13 Descent Methods for Nonnegative Matrix Factorization 273 

Problem 3 (Damped Nonnegative Matrix Factorization) 

wi > zj > /—I / / 

where the damping factor ijy is a positive constant. 

Again, the coordinate descent scheme is appUed with the cychc update order: 
ui,wi,vi,zi,U2,W2,V2,Z2, ■ ■ ■ to result in the following optimal updates for 

Ut,Vt,w, and z,: 

[RrV,]^ + ^W, [Rju,]^ + ^z, 
"f = H ii2 ^' ^' = "" ^' = II ||2 ^ ^ii'l ^' = "' (13.25) 

||v,||2 + iA I|maII2 + 'A 

where t = I. . .r. The updates w, = u, and z, — v, can be integrated in the updates 
of u, and v, to yield Algorithm 6. We have the following results: 

Theorem 5.2 Every limit point generated by Algorithm 6 is a stationary point of 
NMF Problem 2. 

Algorithm 6 (Damped RRI) 



repeat 

for i = 1 to r do 

Rt=A- ^^^t UrvJ 

7: end for 

8: until Stopping condition 



Proof Clearly the cost function in Problem 3 is continuously dififerentiable over 
the set Ui X • • • X U,. X Ui X • • • X U,. X Vi X • • • X V,. X Vi X • • • X Vr, and the 
U/'s and V,'s are closed convex. The uniqueness of the global minimum of the 
elementary problems and a direct application of Proposition 2.7.1 in [4] prove that 
every limit point of Algorithm 6 is a stationary point of Problem 3. 

Moreover, at a stationary point of Problem 3, we have u, = w, and 
Vt = z,,t = \. . .r. The cost function in Problem 3 becomes the cost function of the 
NMF Problem 2. This implies that every stationary point of Problem 3 yields a 
stationary point of the standard NMF Problem 2. D 

This damped version not only helps to eliminate the problem of zero compo- 
nents in the convergence analysis but may also help to avoid zero columns in the 
approximation when i/^ is carefully chosen. But it is not an easy task. Small values 
of ^ provide an automatic treatment of zeros while not changing much the updates 
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of RRI. Larger values of ij/ might help to prevent the vectors u, and v,{t — 1. . .r) 
from becoming zero too soon. But too large values of ij/ limit the updates to only 
small changes, which will slow down the convergence. 

In general, the rank of the approximation can still be lower than the requested 
dimension. Patches may still be needed when a zero component appears. There- 
fore, in our experiments, using the undamped RRI Algorithm 5 with the substi- 
tution (13.24) is still the best choice. 



13.5.3 Variants of the RRI Method 

We now extend the Rank-one Residue Iteration by using a factorization of the type 
XDY^ where D is diagonal and nonnegative and the columns of the nonnegative 
matrices X and Y are normalized. The NMF formulation then becomes 



1 
mm - 



(=1 



where X,'s and Y,'s are sets of normed vectors. 

The variants that we present here depend on the choice of X,'s and Y,'s. A 
generalized Rank-one Residue Iteration method for low-rank approximation is 
given in Algorithm 7. This algorithm needs to solve a sequence of elementary 
problems of the type: 

maxy^s (13.26) 

where y e M" and § C M" is a set of normed vectors. We first introduce a per- 
mutation vector /y = (i'l i2 ... i„) which reorders the elements of y in non- 
increasing order: y^ >3'n+, ,^ = 1. . .(« — 1). The function ^(y) returns the number 
of positive entries of y. 

Algorithm 7 GRRI 

1 to r 



1: 


Initialize x,;'s, y^'s and di's, for i 


2: 


repeat 


3: 


for i = 1 to r do 


4: 


R, = A-Y.j^rdjXjyJ 


5: 


Vi ^ argmax,gY, {^J Ris) 


6; 


x^ ^ argmax,,gx, [vfRjc) 


7: 


d, = xjR,y, 


8: 


end for 


9: 


until Stopping condition 
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Let us first point out that for the set of normed nonnegative vectors the solution 
of problem (13.26) is given by s* = tTt-- It then follows that Algorithm 7 is 

essentially the same as Algorithm 5 since the solutions v, and m,- of each step of 
Algorithm 7, given by (13.23), correspond exactly to those of problem (13.26) via 
the relations >>,• = m;/||m;||2,3'/ = v,V||v,-||2 and d/ = llMiiyiv/Hj- 

Below we list the sets for which the solution s* of (13.26) can be easily 
computed. 



• 



Set of normed vectors: s — ti^-. This is useful when one wants to create fac- 

torizations where only one of the factor t/ or V is nonnegative and the other is 

real matrix. 

Set of normed nonnegative vectors: s — Tfrr- 

Set of normed bounded nonnegative vectors {s}: where 0<li<Si<pi. The 

optimal solution of (13.26) is given by: 



• 



• 



s = max /, min p 



Set of normed binary vectors {s}: where ■? = Tifrr and b £ {0, 1}". The optimal 
solution of (13.26) is given by: 

-jp iff<^* u ,* Ell 3'-, 

yk' where k = argmax L. — . 

otherwise t \Jk 

Set of normed sparse nonnegative vectors: all normed nonnegative vectors 
having at most K nonzero entries. The optimal solution for (13.26) is given by 
norming the following vector p* 

r *i ^ hi, if f< minlX);),^) 
'' [0 otherwise 

Set of normed fixed- sparsity nonnegative vectors: all nonnegative vectors s a 
fixed sparsity, where 

. ^/n- ||i||i/||.?||2 

sparsity(.s) = 



v^-1 



The optimal solution for (13.26) is given by using the projection scheme 
in [15]. 

One can also imagine other variants, for instance by combining the above ones. 
Depending on how data need to be approximated, one can create new algorithms 
provided it is relatively simple to solve problem (13.26). There have been some 
particular ideas in the literatures such as NMF with sparseness constraint 
[15], Semidiscrete Matrix Decomposition [16] and Semi-Nonnegative Matrix 
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Factorization [9] for which variants of the above scheme can offer an ahernative 
choice of algorithm. 

Remark Only the first three sets are the normed version of a closed convex set, as 
required for the convergence by Theorem 5.1. Therefore the algorithms might not 
converge to a stationary point with the other sets. However, the algorithm always 
guarantees a non-increasing update even in those cases and can therefore be 
expected to return a good approximation. 



13.5.4 Nonnegative Tensor Factorization 

If we refer to the problem of finding the nearest nonnegative vector to a given 
vector a as the nonnegative approximation in one dimension, the NMF is its 
generalization in two dimensions and naturally, it can be extended to even higher- 
order tensor approximation problems. Algorithms described in the previous section 
use the closed form solution of the one dimensional problem to solve the two- 
dimensional problem. We now generalize this to higher orders. Since in one 
dimension such an approximation is easy to construct, we continue to use this 
approach to build the solutions for higher order problems. 

For a low-rank tensor, there are two popular kinds of factored tensors, namely 
those of Tucker and Kruskal [2]. We only give an algorithm for finding approx- 
imations of Kruskal type. It is easy to extend this to tensors of Tucker type, but this 
is omitted here. 

Given a d dimensional tensor T, we will derive an algorithm for approximating 
a nonnegative tensor by a rank-r nonnegative Kruskal tensor S^ jjnixnjx-xnrf 
represented as a sum of r rank-one tensors: 



5 = ^ aiUY,*:U2i* ■ ■ ■ *Udi 



where a, e M+ is a scaling factor, u,, e M'^ is a normed vector (i.e. ||m„||2 = 1) and 

a*b stands for the outer product between two vectors or tensors a and b. 

The following update rules are the generalization of the matrix case to the 
higher order tensor: 

>'=(•• •((■ ■ -{RkUik)- ■ ■U(t-i)k)u(,+i)k)- ■ ■)udk (13.27) 

'^* = llb] + ll2, «/^ = — , (13.28) 

<^k 

where Rk = T ~ J^ijtk (^ii^\i*i^2i* ■ • • *Udi is the residue tensor calculated without 
the A:th component of S and RkUy is the ordinary tensor/vector product in the 
corresponding dimension. 
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We can then produce an algorithm which updates in a cyclic fashion all vectors 
Ujj. This is in fact a direct extension to Algorithm 5, one can carry out the same 
discussion about the convergence here to guarantee that each limit point of this 
algorithm is a stationary point for the nonnegative tensor factorization problem 
and to improve the approximation quality. 

Again, as we have seen in the previous section, we can extend the procedure to 
take into account different constraints on the vectors My such as discreteness, 
sparseness, etc. 

The approach proposed here is again different from that in [8] where a similar 
cascade procedure for multilayer nonnegative matrix factorization is used to 
compute a 3D tensor approximation. Clearly, the approximation error will be 
higher than our proposed method, since the cost function is minimized by taking 
into account all the dimensions. 



13.5.5 Regularizations 

The regularizations are common methods to cope with the ill-posedness of inverse 
problems. Having known some additional information about the solution, one may 
want to imposed a priori some constraints to algorithms, such as: smoothness, 
sparsity, discreteness, etc. To add such regularizations in to the RRI algorithms, it 
is possible to modify the NMF cost function by adding some regularizing terms. 
We will list here the update for m,'s and v,'s when some simple regularizations are 
added to the original cost function. The proofs of these updates are straight- 
forward and hence omitted. 

• One-Norm 1 1 . 1 1 j regularization: the one-norm of the vector variable can be added 
as a heuristic for finding a sparse solution. This is an alternative to the fixed- 
sparsity variant presented above. The regularized cost function with respect to 
the variable v, will be 

^p,-«yii^ + /?iivii,, /?>o 

where the optimal update is given by 



fi]u, 



Ph X 1] , 



The constant /f > can be varied to control the trade-off between the 
approximation error jlj/?, — M/V"'^||^ and ||v|jj. From this update, one can see that 
this works by zeroing out elements of Rju, which are smaller than /?, hence 
reducing the number of nonzero elements of v* . 
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Smoothness regularization ||v — -Bv,||^ : where v, is the current value of v, and 
the matrix B helps to calculate the average of the neighboring elements at each 
element of v. When v is a ID smooth function, B can be the following « x n 
matrix: 



B = 



i 

2 





Vo 



2 









i 

2 

0/ 



(13.29) 



This matrix can be defined in a different way to take the true topology of v into 
account, for instance v = vec(/^) where F is a matrix. The regularized cost 
function with respect to the variable v, will be 



1, 



\\R. 



u,v 



Jll2 



2" ' ' '"• 2" 
where the optimal update is given by 



v-Bvt\ 



<5 >0 



[r;u, 

\\u,\ 



5Bv,] 



The constant 5>Q can be varied to control the trade-off between the 
approximation error )^\\R, — m,v^||^ and the smoothness of v, at the fixed point. 
From the update, one can see that this works by searching for the optimal 
update V* with some preference for the neighborhood of 5v,, i.e., a smoothed 
vector of the current value v,. 

The two above regularizations can be added independently to each of the 
columns of V and/or V. The trade-off factor /f (or 6) can be different for each 
column. A combination of different regularizations on a column (for instance V() 
can also be used to solve the multi-criterion problem 



>, 



M,V 



J\\l 



'II l|2 
2MI2- 



•^11 

2" 



By,\%, ;e,7,^>o 



where the optimal update is given by 



, [^fM,-/?l„xi+^5v,]+ 

v^ = 2 ^■ 

Il«(|l2 + ^ 

The one-norm regularizations as well as the two-norm regularization can be 
found in [1, 3]. A major difference with that method is that the norm constraints is 
added to the rows rather than on the columns ofVoiU as done here. However, for 
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the two versions of the one-norm regularization, the effects are somehow similar. 
While the two-norm regularization on the columns of U and V are simply scaling 
effects, which yield nothing in the RRI algorithm. We therefore only test the 
smoothness regularization at the end of the chapter with some numerical generated 
data. 

For more extensions and variants, see [13]. 



13.6 Experiments 

Here we present several experiments to compare the different descent algorithms 
presented in this paper. For all the algorithms, the scaling scheme proposed in 
Sect. 13.4 was applied. 



13.6.1 Random Matrices 

We generated 100 random nonnegative matrices of different sizes. We used seven 
different algorithms to approximate each matrix: 

• the multiplicative rule {Mult), 

• alternative least squares using Matlab function Isqnonneg (ALS), 

• a full space search using line search and Armijo criterion (FLine), 

• a coordinate search alternating on U and V, and using line search and Armijo 
criterion (CLine), 

• a full space search using first-order approximation (FFO), 

• a coordinate search alternating on U and V, and using first-order approximation 
(CFO) 

• an iterative rank-one residue approximation (RRI). 

For each matrix, the same starting point is used for every algorithm. We create 
a starting point by randomly generating two matrices U and V and then rescaling 
them to yield a first approximation of the original matrix A as proposed in Sect. 
13.4: 



where 



U=UDy/a, V=VD'^y/a, 



(A^UV^) _ r. ^ L/m if/ 



and A, = <^ yjm '^ '-■' 



{UV\UV^) " \ Q ' otherwise 



From (13.15), we see that when approaching a KKT stationary point of the 
problem, the above scaling factor a ^ 1 . This implies that every KKT stationary 
point of this problem is scale-invariant. 
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The algorithms are all stopped when the projected gradient norm is lower than e 
times the gradient norm at the starting point or when it takes more than 45 s. The 
relative precisions e are chosen equal to 10"^, 10""^, 10""*, 10"^, 10"^. No limit was 
imposed on the number of iterations. 

For alternative gradient algorithms CLine and CFO, we use different precisions 
6(7 and ey for each of the inner iteration for U and for V as suggested in [19] where 
eij and ey are initialized by 10"^. And when the inner loop for U or: V needs no 
iteration to reach the precision tu or ey, one more digit of precision will be added 
into Cf/ or ey (i.e. tu = Cu/IQ or ty = ey/10). 

Table 13.1 shows that for all sizes and ranks, Algorithm RRI is the fastest to 
reach the required precision. Even though it is widely used in practice, algorithm 
Mult fails to provide solutions to the NMF problem within the allocated time. A 
further investigation shows that the algorithm gets easily trapped in boundary 
points where some t/y and/or Vy is zero while Vf/, and/or Vy, is negative, hence 
violating one of the KKT conditions (13.11). The multiplicative rules then fail to 
move and do not return to a local minimizer. A slightly modified version of this 
algorithm was given in [20], but it needs to wait to get sufficiently close to such 
points before attempting an escape, and is therefore also not efficient. The ALS 
algorithm can return a stationary point, but it takes too long. 

We select five methods: FLine, CLine, FFO, CFO and RRI for a more detailed 
comparison. For each matrix A, we run these algorithms with 100 different starting 
points. Figures 13.1, 13.2, 13.3 and 13.4 show the results with some different 
settings. One can see that, when the approximated errors are almost the same 
between the algorithms, RRI is the best overall in terms of running times. It is 
probably because the RRI algorithm chooses only one vector m, or Vt to optimize at 
once. This allows the algorithm to move optimally down on partial direction rather 
than just a small step on a more global direction. Furthermore, the computational 
load for an update is very small, only one matrix-vector multiplication is needed. 
All these factors make the running time of the RRI algorithm very attractive. 



13.6.2 Image Data 

The following experiments use the Cambridge ORL face database as the input data. 
The database contains 400 images of 40 persons (10 images per person). The size of 
each image is 1 12 x 92 with 256 gray levels per pixel representing a front view of 
the face of a person. The images are then transformed into 400 "face vectors" in 
Ki0'304(ii2 X 92 = 10, 304) to form the data matrix A of size 10, 304 x 400. We 
used three weight matrices of the same size of A (ie. 10, 304 x 400). Since it was 
used in [18], this data has become the standard benchmark for NMF algorithms. 

In the first experiment, we run six NMF algorithms described above on this data 
for the reduced rank of 49. The original matrix A is constituted by transforming 
each image into one of its column. Figure 13.5 shows for the six algorithms the 
evolution of the error versus the number of iterations. Because the minimization 
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Table 13.1 Comparison of average successful running time of algorithms over 100 random 
matrices 



e 


Mult 


ALS 


FLine 


CLine 


FFO 


CFO 


RRI 


(m = 


50, Aj = 20, 


r = 2) 












10-2 


0.02(96) 


0.40 


0.04 


0.02 


0.02 


0.01 


0.01 


10-3 


0.08(74) 


1.36 


0.12 


0.09 


0.05 


0.04 


0.03 


10-4 


0.17(71) 


2.81 


0.24 


0.17 


0.11 


0.08 


0.05 


10-5 


0.36(64) 


4.10 


0.31 


0.25 


0.15 


0.11 


0.07 


lo-f* 


0.31(76) 


4.74 


0.40 


0.29 


0.19 


0.15 


0.09 


(m = 


100, n = 50 


,r = 5) 












10-2 


45 * (0) 


3.48 


0.10 


0.09 


0.09 


0.04 


0.02 


10-3 


45 * (0) 


24.30(96) 


0.59 


0.63 


0.78 


0.25 


0.15 


lo-'' 


45 * (0) 


45 * (0) 


2.74 


2.18 


3.34 


0.86 


0.45 


10-5 


45 * (0) 


45 * (0) 


5.93 


4.06 


6.71 


1.58 


0.89 


lo-f^ 


45 * (0) 


45 * (0) 


7.23 


4.75 


8.98 


1.93 


1.30 


(m = 


100, n = 50 


, r = 10) 












10-2 


45 * (0) 


11.61 


0.28 


0.27 


0.18 


0.11 


0.05 


10-3 


45 * (0) 


41.89(5) 


1.90 


2.11 


1.50 


0.74 


0.35 


10-4 


45 * (0) 


45 * (0) 


7.20 


5.57 


5.08 


2.29 


1.13 


10-5 


45 * (0) 


45 * (0) 


12.90 


9.69 


10.30 


4.01 


1.71 


lo-f* 


45 * (0) 


45 * (0) 


14.62(99) 


11.68(99) 


13.19 


5.26 


2.11 


(m = 


100, n = 50 


,r = 15) 












10-2 


45 * (0) 


25.98 


0.66 


0.59 


0.40 


0.20 


0.09 


10-3 


45 * (0) 


45 * (0) 


3.90 


4.58 


3.18 


1.57 


0.61 


10-4 


45 * (0) 


45 * (0) 


16.55(98) 


13.61(99) 


9.74 


6.12 


1.87 


10-5 


45 * (0) 


45 * (0) 


21.72(97) 


17.31(92) 


16.59(98) 


7.08 


2.39 


lo-f- 


45 * (0) 


45 * (0) 


25.88(89) 


19.76(98) 


19.20(98) 


10.34 


3.66 



(m = 100, n = 100, r = 20) 

10-2 45*(0) 42.51(4) 1.16 0.80 0.89 0.55 0.17 



10-3 


45 * (0) 


45 * (0) 


9.19 


8.58 


10.51 


5.45 


1.41 


10-4 


45 * (0) 


45 * (0) 


28.59(86) 


20.63(94) 


29.89(69) 


12.59 


4.02 


10-5 


45 * (0) 


45 * (0) 


32.89(42) 


27.94(68) 


34.59(34) 


18.83(90) 


6.59 


IQ-^ 


45 * (0) 


45 * (0) 


37.14(20) 


30.75(60) 


36.48(8) 


22.80(87) 


8.71 


(m = 


200, n = 100, r = 30) 












10-2 


45 * (0) 


45 * (0) 


2.56 


2.20 


2.68 


1.31 


0.44 


10-3 


45 * (0) 


45 * (0) 


22.60(99) 


25.03(98) 


29.67(90) 


12.94 


4.12 


10-4 


45 * (0) 


45 * (0) 


36.49(2) 


39.13(13) 


45 * (0) 


33.33(45) 


14.03 


10-5 


45 * (0) 


45 * (0) 


45 * (0) 


39.84(2) 


45 * (0) 


37.60(6) 


21.96(92) 


io-« 


45 * (0) 


45 * (0) 


45 * (0) 


45 * (0) 


45 * (0) 


45 * (0) 


25.61(87) 



Time limit is 45 s. 0.02(96) means that a result is returned with the required precision e within 
45 s for 96 (of 100) matrices of which the average running time is 0.02 s. 45 * (0): failed in all 
100 matrices 
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Fig. 13.1 Comparison of selected algorithms for e = 10" 
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process is different in each algorithm, we will say that one iteration corresponds to 
all elements of both U and V being updated. Figure 13.6 shows the evolution of 
the error versus time. Since the work of one iteration varies from one algorithm to 
another, it is crucial to plot the error versus time to get a fair comparison between 
the different algorithms. In the two figures, we can see that the RRI algorithm 



13 Descent Methods for Nonnegative Matrix Factorization 



283 



m = 100, n = 50, r= 15,e= 10 



m = 100, n = 50, r = 15,e = 10 



^ 0.1208 




0.1202 



t 


■ 


+ 




+ 


- 


+ + 






- 


1 1 + 


■ 


1+1+ 




^ ^ ! " 
















1 
1 




1 


J- 




1 


t - 


_L 


^ 



FLine CLine FFO CFO RRI 
Algorithm 

Fig. 13.3 Comparison of selected algorithms for e = 10" 



FLine CLine FFO CFO RRI 
Algorithm 



m = 100, n = 100, r = 20, e = 10"' 



m = 100, n = 100, r = 20,e = 10"' 



0.132 


+ 1 




20 


+ 






0.1318 


t 1 

- T \ ^ i 






+ 
T t 






^ 0.1316 






1 + 






g 


1 1 ' 1 — 


en 
■o 


15 


1 + 






LU 0.1314 


- ' 1 ' ' ■ 














III 


















1 






Q) 




































-D 0.1312 


- 1— 1^ 










1 1 










o 


















1 


CD 

E 














-^ 




LU 0.131 




















1 




10 






1 






1 




g 
























c 
















1 




« 0.1308 








L^ 


L- -J 












c 








1 , 




a> 


1 1 ■ 






ZJ 














cc 




DC 




1 1 








0.1306 


1 1 1 1 




5 


J- -L 


_L 






0.1304 


I'll' 










+ 


0.1302 


_:":-- . 











^ 




FLine CLine FFO CFO RRI 






FLine CLine 


FFO CFO 


RRI 












A 


gorl 


hm 






















A 


gori 


hm 







Fig. 13.4 Comparison of selected algorithms for e = 10 



behaves very well on this dataset. And since its computation load of each iteration 
is small and constant (without inner loop), this algorithm converges faster than the 
others. 
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Fig. 13.5 NMF: error vs. iterations 
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13.6 NMF: error vs. time 



In the second experiment, we construct a third-order nonnegative tensor 
approximation. We first build a tensor by stacking all 400 images to have a 
112 X 92 X 400 nonnegative tensor. Using the proposed algorithm, a rank-142 
nonnegative tensor is calculated to approximate this tensor. Figure 13.7 shows the 
result for six images chosen randomly from the 400 images. Their approximations 
given by the rank-142 nonnegative tensor are much better than that given by 
the rank-8 nonnegative matrix, even though they require similar storage space: 
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BSBaQG 




nHnngo 



Fig. 13.7 Tensor factorization vs. matrix factorization on facial data. Six randomly chosen 
images from 400 of ORL dataset. From top to bottom: original images, their rank-8 truncated 
SVD approximation, their rank- 142 nonnegative tensor approximation (150 RRl iterations) and 
their rank-8 nonnegative matrix approximation (150 RRI iterations) 



8 * (112 * 92 -I- 400) = 85632 and 142 * (112 + 92 + 400) = 85768. The rank-8 
truncated SVD approximation (i.e. [Ag]^) is also included for reference. 

In the third experiment, we apply the variants of RRI algorithm mentioned in 
Sect. 13.5 to the face databases. The following settings are compared: 

Original: original faces from the databases. 

49NMF: standard factorization (nonnegative vectors), r — 49. 

lOOBinary: columns of U are limited to the scaled binary vectors, r = 100. 

49SparselO: columns of U are sparse. Not more than 10% of the elements of each 

column of A are positive, r = 49. 
49Sparse20: columns of U are sparse. Not more than 20% of the elements of each 

column of A are positive, r = 49. 
49HSparse60: columns of U are sparse. The Hoyer sparsity of each column of U 

are 0.6. r = 49. 
49HSparse70: columns of U are sparse. The Hoyer sparsity of each column of U 

are 0.7. r = 49. 



286 



N.-D. Ho et al. 



49HBSparse60: columns of U are sparse. The Hoyer sparsity of each column of U 

are 0.6. Columns of V are scaled binary, r = 49. 
49HBSparse70: columns of U are sparse. The Hoyer sparsity of each column of U 

are 0.7. Columns of V are scaled binary, r ~ 49. 

For each setting, we use RRI algorithm to compute the corresponding factor- 
ization. Some randomly selected faces are reconstructed by these settings as shown 




Original 



49NMF 



lOOBinary 



49SparselO 



49Sparse20 



49HSparse60 



49HSparse70 



49HBSparse60 



49HBSparse70 



Fig. 13.8 Nonnegative matrix factorization with several sparse settings 
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Fig. 13.9 Bases from lOOBinary setting 

in Fig. 13.8. For each setting, RRI algorithm produces a different set of bases to 
approximate the original faces. When the columns of V are constrained to scaled 
binary vectors {lOOBinary), the factorization can be rewritten as UV^ = UB^, 
where 5 is a binary matrix. This implies that each image is reconstructed by just 
the presence or absence of 100 bases shown in Fig. 13.9. 

Figures 13.10 and 13.11 show nonnegative bases obtained by imposing some 
sparsity on the columns of V. The sparsity can be easily controlled by the per- 
centages of positive elements or by the Hoyer sparsity measure. 
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Fig. 13.10 Sparse bases 49Sparse20 and 49SparselO. Maximal percentage of positive elements 
is 20% (a) and 10% (b) 




Fig. 13.11 Hoyer sparse bases 49HSparse60 and 49HSparse70. Sparsity of bases is 0.6 (a) and 

0.7 (b) 
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Fig. 13.12 Hoyer sparse bases 49HBSparse60 and 49HBSparse70. Sparsity of bases is (a) and 
0.7 (b). V is binary matrix 

Figure 13.12 combines the sparsity of the bases (columns of U) and the binary 
representation of V. The sparsity is measured by the Hoyer measure as in 
Fig. 13.11. Only with the absence or presence of these A9 features, faces are 
approximated as showed in the last two rows of Fig. 13.8. 

The above examples show how to use the variants of the RRl algorithm to 
control the sparsity of the bases. One can see that the sparser the bases are, the less 
storage is needed to store the approximation. Moreover, this provides a part-based 
decomposition using local features of the faces. 



13.6.3 Smooth Approximation 



We carry out this experiment to test the new smoothness constraint introduced in 
the previous section: 



1,1 

-\\Ri - M;V 



J||2 



5v,|If, <5>0 



2" '"■ 2" 

where B is defined in (13.29). 

We generate the data using four smooth nonnegative functions /i,/2, fi and/4, 
described in Fig. 13.13, where each function is represented as a nonnegative vector 
of size 200. 

We then generate a matrix A containing 100 mixtures of these functions as 
follows 
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20C 




50 100 150 200 

Fig. 13.14 Randomly selected generated data 



50 100 150 200 



A = max(F£^ + N, 0) 

where F — \fi f2 /s /4] , £ is a random nonnegative matrix and A^ is normally dis- 
tributed random noise with HA^H^^ — 0.2\\FE^\\p. Four randomly selected columns 
of A are plotted in Fig. 13.14. 
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Fig. 13.15 Original functions vs. reconstructed functions 



We ran the regularized RRI algorithm to force the smoothness of columns of U. 
We apply, for each run, the same value of 5 for all the columns of 
U : d = 0,10, 100. The results obtained through these runs are presented in 
Fig. 13.15. We see that, without regularization, i.e. 5 — 0, the noise is present in 
the approximation, which produces nonsmooth solutions. When increasing the 
regularizing terms, i.e. d = 10, 100, the reconstructed functions become smoother 
and the shape of the original functions are well preserved. 

This smoothing technique can be used for applications like that in [27], where 
smooth spectral reflectance data from space objects is unmixed. The multiplicative 
rules are modified by adding the two-norm regularizations on the factor U and V to 
enforce the smoothness. This is a different approach, therefore, a comparison 
should be carried out. 

We have described a new method for nonnegative matrix factorization that has 
a good and fast convergence. Moreover, it is also very flexible to create variants 
and to add some constraints as well. The numerical experiments show that this 
method and its derived variants behave very well with different types of data. This 
gives enough motivations to extend to other types of data and applications in the 
future. 
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13.7 Conclusion 

This paper focuses on the descent methods for Nonnegative Matrix Factorization, 
which are characterized by nonincreasing updates at each iteration. 

We present also the Rank-one Residue Iteration algorithm for computing an 
approximate Nonnegative Matrix Factorization. It uses recursively nonnegative 
rank one approximations of a residual matrix that is not necessarily nonnegative. 
This algorithm requires no parameter tuning, has nice properties and typically 
converges quite fast. It also has many potential extensions. During the revision of 
this report, we were informed that essentially the same algorithm was published in 
an independent contribution [7] and also mentioned later in an independent per- 
sonal communication [10]. 
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Chapter 14 

A Computational Method for Symmetric 

Stein Matrix Equations 



K. Jbilou and A. Messaoudi 



Abstract In the present paper, we propose a numerical method for solving the 
sparse symmetric Stein equation AXA^ — X + BB^ = 0. Such problems appear in 
control problems, filtering and image restoration. The proposed method is a Krylov 
subspace method based on the global Arnoldi algorithm. We apply the global 
Arnoldi algorithm to extract low-rank approximate solutions to Stein matrix 
equations. We give some theoretical results and report some numerical experi- 
ments to show the effectiveness of the proposed method. 



14.1 Introduction 

In this paper, we present a numerical Krylov subspace method for solving the Stein 
(discrete-time Lyapunov) matrix equation 

AXA'^ - X + BB^ ^ (14.1) 

where A is an « x m real and sparse matrix, X e IR"><« and B G IR"^' with 
rank(B) = s and s -^ n. 
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We assume throughout this paper that l,(A)/l/(A) ^ 1 for all /, j = 1, . . .,n 
(a, (A) denotes the /th eigenvalue of the matrix A) and this condition ensures that 
the solution X of the problem (1) exists and is unique. 

Lyapunov and discrete-time Lyapunov equation play a crucial role in linear 
control and filtering theory for continuous or discrete-time large-scale dynamical 
systems and other problems; see [3-5, 14, 15, 20, 23] and the references therein. 
They also appear in each step of Newton's method for discrete-time algebraic 
Riccati equations [8, 13]. Equation (14.1) is also referred to as discrete-time 
Lyapunov equation. 

Direct methods for solving the matrix equation (14.1) such as those proposed in 
[1] are attractive if the matrices are of small size. These methods are based on the 
B artels-Stewart algorithm [2] or on the Hessenberg-Schur method [7]. 

Iterative methods such as the squared Smith method [6, 16, 21] have been 
proposed for dense Stein equations. All of these methods compute the solution in 
dense form and hence require 0(r?) storage and 0(rv'^ operations. Notice that the 
matrix equation (14.1) can be formulated as an r^ x n^ large linear system using 
the Kronecker formulation (A® A — I„2)vec{X) = — vec(BZ?^) where (X) denotes 
the Kronecker product; {F (g) D — [fijD]), \sc{X) is the vector of IR"' formed by 
stacking the columns of the matrix X and /„ is the identity matrix of order n x n. 
Krylov subspace methods could be used to solve the above linear system. How- 
ever, for large problems this approach is very expensive and doesn't take into 
account the low rank of the right hand side. 

The observability G„ and controllability Gg Gramians of the discrete-time linear 
time-invariant (LTI) system 

x{k+ 1) = Ax{k) + Bu{k) 

y{k)^Dx{k), yt = 0, 1,... 

are the solutions of the Stein equations 

A^GoA - Go + D^D = 



and 



AGgA^ - Gg +BB^ ^0 



where the input m() e IR' and the output y{-) e IR''. 

In many applications, n is quite large while the number of inputs s and outputs 
q usually satisfy s,q -^ n. 

The Gramians of LTI systems play an important role in many analysis and 
design problems for LTI systems such as model reduction, the computation of the 
Hankel norm or //2-norm of the system; see [4, 20, 23]. The problem of model 
reduction is to produce a low dimensional LTI system that will have approxi- 
mately the same response as the original system to any given input u. 
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Recently, several schemes using Krylov subspace methods have been devel- 
oped to produce low-rank approximate solution to Lyapunov and discrete-time 
Lyapunov equations with low-rank right hand sides [9, 10, 12, 18, 22]. 

If the symmetric Stein equation (14.1) is Schur stable, that is if p{A) < 1, where 
p{A) denotes the spectral radius of A, then (14.1) has a unique solution X given by 
[13]: 

OO 

1=0 

In this paper, we present a new Krylov subspace approach to solve the prob- 
lem (14.1). The proposed method takes advantage of the low-rank structure of 
(14.1) to obtain low-rank approximate solutions in factored form. Our algorithm 
uses a Galerkin projection method and is based on the global Arnoldi algorithm 
[11]. 

The remainder of the paper is organized as follows: In Sect. 14.2, we review the 
global Arnoldi algorithm. In Sect. 14.3, new expressions of the exact solution of 
the matrix equation (14.1) are given. Section 14.4 is devoted to the Stein global 
Arnoldi method. The new method is developed and some theoretical results are 
given. In Sect. 14.5 we give some numerical examples to show the effectiveness of 
the proposed method for large sparse problems. 

Throughout this paper we use the following notations. For two matrices X and Y 
in IR'"^*, we define the inner product {X, Y)p — trace(X^F). The associated norm 
is the Frobenius norm or F-norm denoted by || .||^. A system of vectors (matrices) 
of IR"^' is said to be F-orthonormal if it is orthonormal with respect to (., .)p. The 
matrices /„ and 0„ will denote the n x n identity and the null matrices respec- 
tively. Finally, for a matrix Z, \\Z\\2 will denote the 2-norm of Z. 



14.2 The Global Arnoldi Algorithm 

The global Arnoldi algorithm [11] constructs an F-orthonormal basis 

Vi, V2, ■ ■ ., V,„ of the matrix Krylov subspace K.,„{A,B), i.e., 

(^M Vj)f = for i +L i, y = 1, ■ • -,'« and 

(v,-,y,-)^ = i. 

We recall that the minimal polynomial P (scalar polynomial) of A with respect to 
B e IR"^ ' is the nonzero monic polynomial of lowest degree such that P{A)B — 0. 
The degree p of this polynomial is called the grade of B and we have p<n. 
The modified global Arnoldi algorithm is described as follows: 
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Algorithm 1 

The Modified Global Arnoldi algorithm 

Set Vi=BI\\B\\f 
For j = 1 , . . . , m 



V = A 


Vr 


For i 


= l,2,...,i, 


/ijj 


= trace(yf y) 


V = 


1^ - KjV,. 


End. 




^j+i.i 


=II^^IIf, 


Ind. 


= Vlhj + ,,y 



Basically, the global Arnoldi algorithm is the standard Arnoldi algorithm 
applied to the matrix pair {A^ b) where A — Is®A and b — vec(Z?) . When i = 1 , 
Algorithm 1 reduces to the classical Arnoldi algorithm [17]. The global Arnoldi 
algorithm breaks down at step j if and only if /!/+i,/ = and in this case an 
invariant subspace is obtained. However, a near breakdown may occur when a 
subspace is A-invariant to machine precision (when, for somey, /ij+ij is close to 
zero). We note that the global Arnoldi algorithm generates an F-orthonormal basis 
of the matrix Krylov subspace /C,„(A, Vi) C A^„ , where M.n,s is the space of real 
matrices having dimension n x s. 

Let us now introduce some notations: Vm denotes the n x ms matrix Vm = 
[Vi, . . ., Vm]-Hm is the (m + 1) x m upper Hessenberg matrix whose entries hij are 
defined by Algorithm 1 and H,„ is the mx m matrix obtained from Hm by deleting 
its last row. 

With Vm and H„, defined above, and using the Kronecker product (g), the 
following relation is satisfied [11] 

AV,„ = V„{H„ ® /,) + h„+^,,nV„,+iEl ( 14.2) 



and 



AV„ = V,„+i(H„®/,) (14.3) 



where £,^ = [0,, . . .,0,,/,] and V,„+y = [V„,V,„+i\. Note that || V; ||/r= 1, / = 
1,...,OT and II V„, 11/7= ^. 

We have the following properties [11]. 

Proposition 1 Let p(j> <n) be the degree of the minimal polynomial of A with 
respect to Vi , then the following statements are true. 

1. The matrix Krylov subspace K,p{A, V\) is invariant under A and )C,„{A, Vi) is of 
dimension m if and only if p is greater than m — I. 

2. The global Arnoldi algorithm will stop at step m if and only if m — p. 

3. For m<p, {V\, V2, ■ ■ ., V„} is an F-orthonormal basis of the matrix Krylov 
subspace /C,„(A, Vi). 

4. AVp^Vp{Hp(^Q. 
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14.3 Expressions for the Exact Solution of the Stein Equation 

In this section we will give new expressions for the solution X of the Stein 
equation (14.1). If we assume that a.j{A)a.j{A) ^ 1, for i— !,...,« and j = 
1 , . . ., n, where a,(A) denotes the rth eigenvalue of A, then the solution X of the Eq. 
(14.1) exists and is unique. Let P be the minimal polynomial of A with respect to B 
of degree p: 



P{A)B = ^ a,A'B = 0, a^, = 1. 



Associated with the polynomial P, we define the p x p companion matrix K 

1 ... 

K 
















1 


-ao ■ • • 


. . . -C/.p-2 


-0!p-l 



M denotes the following matrix 

M= [B,AB,...,AP-^B\. 
Remark that, if I^ denotes the identity matrix of IR^^'*, then we have 

AM = M{K^ ®h). (14.4) 

Next, involving the minimal polynomial of A for B, we give a closed-form finite 
series representation of the solution X of the Eq. (14.1). 

Theorem 1.1 The unique solution X of (14.1) has the following representation 

^=EE^'''-./'"'^^^(^'r'' (14-5) 

where the matrix Y = (y/.j)i <i!<n '-^ ^^^ solution of 

K^TK~r + exe'\ = (), (14.6) 

and ei denotes the first vector of the canonical basis of\W. 

Proof Remark first that since the eigenvalues of ^ are eigenvalues of A, we have 
Ai{K)Xj{K) 7^ 1 for all ij = 1, . . .,p and this implies that equation (6) has a unique 
solution. Let Y be the following matrix: 
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7=1 i=l 

where F solves the low-order Stein equation (14.6). Then Y can also be expressed 
as 



Y= [B,AB,...,AP-^B]{r(g)I,) 



M(r (g) /,)M^ 



TUT\P-1 



B^A 



Let us show that the matrix F is a solution of the Stein equation (14.1). Using the 
relation (14.4) we obtain 

AYA^ = AM{T (g) I,)M'^A^ 

= M{K^ (g) /,) (r (g) /,) {K g) Is)hf 

= M{K^TK®h)M^. 

On the other hand, since 

B = [B,AB,A^B, . . .,A"-'B][I,, 0,„ . . ., Of 

we have 

BB^ = [B,AB, . . .,AP-^B]{ei g) /,)(e[ g) Is)[B,AB, . . .,A''-^Bf 



= M(eie[g)/,)M^. 



Then 



AYA'^ -Y + BB^ = M{K^TK g) /,)M^ - M{T g) /,)M^ + M(eie[ g) /,)M^ 
= M{K^TK (g /,) - (r (g /,) + (eie[ (g /j)M^ 

= M[(/:^r/«: - r + eie[) g) /,]m^. 

Therefore, since F solves the low-order Stein equation (14.6), we obtain AYA^ — 
Y + BB^ = 0. As the Eq. (14.1) has a unique solution it follows that X = Y which 
completes the proof. 

The following result states that the solution X of (14.1) can be expressed in 
terms of the blocks Vi, . . .,Vp. 

Theorem 1.2 Let p be the grade of B and let Vp be the matrix defined by 
Vp — \Vi, . . ., Vp], where the matrices Vi, . . ., Vp are constructed by the global 
Arnoldi algorithm with Vi = B/ \\ B \\f . Then the unique solution X of (14.1) can 
be expressed as: 
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Z = V,,(f ®/,)Vj, (14.7) 

where T is the solution of the low-order Stein equation: 

HptHl~f+\\B\\le,e\ = Q. (14.8) 

Proof Notice thattheeigenvaluesof//pareeigenvaluesofA,thenl,(//p)lj(i7p) ^ 1 
for all/, i = 1, . . .,/? and this ensures that Eq. (14.8)hasauniquesolutionr. Let Fbe 
the matrix defined by 7 = Vy,(r ® Is)Vj,. Then by substituting Y in (14.1) and using 
the relations: 

5= ||B||^V^(ei ®/,), BB^ = \\B\\lVp{eie] <» h)Vl 

and 

we obtain 

AYA^ -Y + BB^^ Vp[{HptHl - f + ||B||f eie[) ® /,]Vj. 

As r solves the low-order Stein equation (14.8), we get 

AYA^ -Y + BB'^ = Q. 
Therefore, using the fact that the solution of (14. 1) is unique, it follows that X = Y. 

UA The Stein Global Arnoldi Method 

Following the results of Theorem 1.2, we will see how to extract low-rank 
approximations to the solution X of (14.1). Since the exact solution is given by the 
expressions (14.7) and (14.8), the approximate solution X^ that we will consider is 
defined by 

X,„^V,„{Y„,<»h)Vl (14.9) 

where Y^ is the symmetric m x m matrix satisfying the low-dimensional Stein 
equation 

H„Y„Hl-Y„ + \\B\\leie'; = (14.10) 

withei =(l,0,...,0)^elR'". 

From now on, we assume that for increasing m, p{H,„) < 1 which ensures that 
(14.10) has a unique solution Y,„. For the solution of the projected problem (14.10), 
we have the following result 
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Theorem 1.3 Assume that || A ||2 <1, then Wm> 1, the projected Stein matrix 
equation (14.10) has a unique solution. 

Proof Notice first that if || A ||2 < 1 then p{A)<\ which implies that the Stein 
matrix equation (14.1) has a unique solution. Let )."'' be the largest (in modulus) 
eigenvalue of the matrix i/,„ and let a/" denote it's largest singular value. Then 
using the Weyl's inequalities [24], it is known that | a^ \ <a\"'. On the other 

hand, it was shown in [19] that orf' < || A ||2 . Therefore as || A ||2 < 1, it follows 
that p{H„,) < 1 which implies that the projected Stein matrix equation has a unique 
solution. 

The low-dimensional Stein equation (14.10) will be solved by a standard direct 
method such as the Hessenberg-Schur method [1, 7]. Note that, at step m, the 
method proposed in [10], which is based on the block Arnoldi algorithm, yields a 
reduced order Stein equation of dimension ms x ms while the projected Eq. 
(14.10) is of dimension m x m. In [10], it was addressed that rank degradation is 
often observed in the block Arnoldi process and this leads to equations of 
dimension strictly smaller than ms. 

Next, we give an upper bound for the residual norm that can be used to stop the 
iterations in the Stein global Arnoldi algorithm without having to compute extra 
products involving the matrix A. We first give the following lemma to be used 
later. 

Lemma 1 Let V,„ = [Vi, . . ., Vm], where Vi, . . ., V,„ are the matrices generated by 
Algorithm 1. Let Z = [zij\ be a matrix in IR™^'' and let G — [gi,j\ be a matrix of 
jj^mix^ vv/iere r and q are any integers. Then we have 



and 



||V„(Z®/,)||^=||Z||^, (14.11) 



V„,G||^<||G||^. (14.12) 



Proof If z..j,i — 1, . . ., r, denotes the yth column of the matrix Z, we have 

V„(Z (g) /,) = V,n [Z.,1 ® /., . . .,Z.,r (E) Is] 

= [V,„(z.,i® /,),... ,V,„(z.,, «)/,)]. 
As {Vi, . . ., y,„} is F-orthonormal, it results that 

in 

\\VM..J^ls)\\-F=\\Y.^'-J^'\\l 
m 



1=1 

|2 



I 2-V I 

!=1 

z..j Hi ;■= 1, ■••,'•, 
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and then 



=\\z\\l- 

We express the matrix VmG as VmG — \Vmg.i, ■ ■ ■^Vmg.,q\ where gj is the y'th 
column of G. For j = 1, . . ., (7, the vector Vmg.j can be written as follows 



v„g., = £v,-| 



(i-\)s+\.j 



Now, since 11^,11^= 1 for ( = 1, . . .,m, we obtain 



\Vn.g..j\\l<Y. 



i=\ 



/g{i-l)s+lj\ 



V gisj J 

\\8..j\\l'J= 1, •••,?• 



Therefore 



\VMl = T.\\^'"Sjl 



<\\G\ 



F- 



In the following theorem, we give an upper bound for the residual norm. 

Theorem 1.4 Let X^ be the approximate solution obtained, at step m, by the Stein 
global Amoldi algorithm and let R{X„) — AX„A^ — X„ + BB^ be the corre- 
sponding residual. Then 



\\R{X„:)\\p<h^+x,mJ2\\HjX2 + hl+, „(?,lr^)', (14.13) 



where Y„ is the last column of the matrix ¥„ and ¥„ denotes the last component 
ofY„. 

Proof At step m, the residual R{X„) is written as 

R{X^) = AV,n{Y,n ® QVlA^ - V„{Y„ ® QVj,, + BB^. 
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Using the relations (14.2)-(14.3) and the fact that E„, = e,„ (g) /,, we obtain 



R{X,„) — V,„+i 

Invoking (14.10) it follows that 






/,2 p'^Y P 



K+1- 



\\Rix„)\\l 



V, 



m+1 







Now, using the relation (14.12) of Lemma 1, we get 



'^m+l,m^P2^m^™ "h,_li ^^^^m^n 



K+, 



\\Rix,„)\\F< 

On the other hand, we set 
and 







2 T 

m+1, mm 



iT u2 „T 



a,„=||(Z®/,)Vj,+i||^. 
Note that a,„ is also expressed as 

am = ||Vm+l(Z(K)/,)|lF- 

Then, setting G = Z®Is and applying (14.11) we obtain 






h , , p' Y H' h^ p' Y p ; ' 

"m+l,m'^,„J^m^^m "m+l.m'^m'' m^^m J 



Therefore 



^\\'^m+\.m^m^ m^m ||2^V m+1 m^m^^'^) 
^"m+[.m\\"i"'^i"^in\\2~^\"m+l.m^m^"tem) 



m+ 1 ,m 



9 t/ Y 

^ '^m -* m 



m+ 1 ,m 



(^i-T 



||i?(^m)||^ < hl^,„{2\\H,„Y,„\\l + hUUYir^)'}- 



The result of Theorem 1 .4 allows us to compute an upper bound for the residual 
norm without computing the residual which requires the construction of the 
approximation X,„ and two matrix-matrix products. This provides a useful stopping 
criterion in a practical implementation of the algorithm. 
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The Stein global Arnoldi algorithm is summarized as follows: 

Algorithm 2: 

The Stein global Arnoldi algorithm 

Choose a tolerance e > 0, an integer parameter ki 

and set k = 0, m = kj. 
FoT j = k + 1, k + 2, . . . , k + ki 

construct the F-orthonormal basis T4:+i, . . , Vfc+fc, 
and Hjji by Algorithm 1. 
End, 

Solve the low-dimensional problem: 

H^Y,^Hl - y,„ + \\Bfpe.ie:{ = 0. 
Compute the upper bound for the F-norm of the residual: 

r -h ,1 1/2IIW Y l|2 + /,2 CyM^ 

' "^ — 't-m+ljm A/ '^W-^-^m^ na\\2 ^ '^m+l,m\ "^ / " 

If Tm > e, set k := k + ki, m = k + ki and return. 
The approximate solution is given as Xm = Vm{Ym'S) Is)Vrn- 

Remarks As m increases, the computation of Y,„ becomes expensive. To avoid 
this, the procedure above introduces a parameter ki, to be chosen, such that the 
low-order Stein equation is solved every ki iterations. Note also that, when con- 
vergence is achieved, the computed approximate solution X„, is stored as the 
product of smaller matrices. The next perturbation result shows that the approxi- 
mation X„ is an exact solution of a perturbed Stein equation. 

Theorem 1.5 Assume that, at step m, the matrix V,,, is of full rank. Then the 
approximate solution o/(14.1) solves the following Stein equation: 

(A - ^„,)X„,{A - A„,)^ - X,„ + BB^ = 0, (14.14) 

where A„, = /!,„+!, mV„+i£j,V+ and V+ = (Vj,V,„)" V^ is the pseudo-inverse of 
the matrix Vm- 

Proof Applying the Kronecker product to (14.10), we obtain 

(H„ ® h){Y„ ® h){Hl ® /,) - {Y„ ® /,) + ||B||^(ei ® /,)(e[ ® /,) = 0. (14.15) 

Multiplying (14.15) on the right by Vj,, on the left by Vm and using (14.2) we get 

[AVm - h,n+i,,nVm+\El]{Y„ (g) /,)[AV„ - Wi,™ V,„+i£^]^ - V„(y„ ® h)V'i 

+ \\B\\lV,„{ex®Is){e[®h)Vl 
= 0. 

Now, using the fact that V,„{e\ ®Is) = Vi and X,„ = Vin{Yin® hW]n^ ^^ obtain 

(A - A„,)Z,„(A - A,„f - X,„ + BB^ = 0. 

The next result gives an upper bound for the Frobenius norm of the error X — X^ 
where X is the exact solution of (14.1). 
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Theorem 1.6 Assume that m steps of Algorithm 2 have been run. Let X,„ be the 
obtained approximate solution o/(14.1). If ||A||2< 1, then we have 

II V V II ^ U '^W^mJ^ m\\2^ \nm+\.m^m \ 

W-^ — ^m||2 Snm+\,m ; ,, , ,,2 i 

I-IIAII2 

where Y,„ is the last column ofY^ (the solution of the low-order problem (14.10)) 
and Ym denotes the last component of Y„ . 

Proof Remark first that jjAJIj < 1 implies that p{A) < 1 which insures that the Stein 
matrix equation (14.1) has a unique solution. The matrix equation (14.14) can be 
expressed as 

AX^A^ -X„+BB^ = L,„, ( 14.16) 

where 

Lm = AX„,Aj^j + A,„A,„A — A,„A,„Ajjj. 
Subtracting (14.16) from the initial Stein equation (14.1), we get 

A{X~X„,)A' ~{X~X„)+L,„ = Q. 
Now, as p{A) < 1, the error X — X„, can be written as 

oc 

X-X„^Y.^'^"'^^'f- 

(=0 

Hence 



X — X„, 



Invoking (14.9) and (14.2), L„ is given by 

+ h„+x,„y,n+\{elYmHl (g) h)Vl 
+ hl+i,„V„+i{elY„e„, (g> /.)V,^+i, 



< 


IliJb'^llAll^' 

1=0 

II r II •, 




" '" "^1 - llAlir 



therefore 



L,„ h < \\ L„, \\f 

<2h,n+ijn II V„,(//,„F,„e„(g)/5)y,^^i \\f 
+ hi+i,,„ II V^+i{elY,„e„(Sl,,)Vl+i \\ 
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Table 14.1 Costs for Stein global Amoldi and Stein block Arnoldi algorithms 



Cost 


Stein block Arnoldi 


Stein global Amoldi 


Matrix-vector 


.v(m+l) 


s{m + 1) 


Mult, of blocks n x .v and s x s 


m(m + l)/2 




n-vector DOT 


m(m+ l).s72 


m(m+ \)s/2 


MGS on n X s blocks 


m 




Solving low-order Stein equation 


0(m^s^) 


0[m') 



On the other hand, as y,„+i is F-orthonormal, we have || V^+i ||f= 1. Using the 
relation (14.11), we get || V,„{H,„Y,„em(^ I^) \\f—\\ H,„Y,„e,„ \\f and finally, as 
eJ„Ymem is a scalar it follows that || V,„+i{elj,„e„ (^ I,) ||f =| elY,„e,„ \ and then 
the result follows. 

In Table 14.1, we listed the major work, at each iteration m, used for the Stein 
block and global Arnoldi algorithms. In addition to matrix- vector products the 
Stein block Arnoldi algorithm requires the application of the modified Gram- 
Schmidt process on matrices of dimension n x s and the solution of Stein equa- 
tions of order at most ms. 



14.5 Numerical Examples 

All the experiments were performed on a computer of Intel Pentium processor at 
1.6 GHz and 3 GBytes of RAM using Matlab 7.4. The matrix test A was divided 
by II A II 1 . The entries of the matrix B were random values uniformly distributed 
on [0 1]. The starting block V, is Vi =B/\\B \\f . 

Example 1 For the first experiment, we compared the effectiveness of the 
Stein global Arnoldi algorithm with the fixed point iteration method defined as 
follows: 

Xq = BB^, Ao = AX,„+i = A,„X,„Al + X„, and A„,+i ^ A^, m = 0, 1, . . . . 

This method is known as the Squared Smith Method (SQSM) (see [6, 21]). For this 
experiment, the matrix A was the matrix test PDE900 (n = 900 and 
nnz{A) = 4380) from the Harwell Boeing collection. We used s ~ 4 and A:i = 1. 

We compared the CPU-time (in seconds) used for convergence with the two 
methods. For the SQSM method, the iterations were stopped when the F-norm of 
the residual R„, = AX,„A^ — X,„ + BB^ is less than some tolerance tol — 10"'^ and 
for the Stein global Arnoldi algorithm, the iterations were stopped when the upper 
bound r^ for the residual norm is less than 10"'^. The results are reported in 
Table 14.2. 
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Table 14.2 n = 900, s = A 
and ^1 = 1 



Stein global Amoldi 



SQSM 



Residual norms 
CPU-time in sec. 



3.5 X IQ- 

4.4 



2.7 X 10-" 
59 



Example 2 The matrix A is generated from the 5-point discretization of the 
operator 



8m 8m 

L{u) ^ Au - fi[x,y)—- f2{x,y)— 



?(x,y)M 



on the unit square [0, 1] x [0, 1] with homogeneous Dirichlet boundary conditions. 
We set/i (x, y) — y? , fiix, y) — e^' and g(x, y) = xy. The dimension of the matrix A 
is « = «Q where mq is the number of inner grid points in each direction. 

Experiment 2.1 For this experiment, we set n = 900 and .? = 4. In Fig. 14. 1 (left), 
we plotted the true residual norm (solid-line) and the upper bound (dotted-line) 
given by Theorem 1.4, versus the number of iterations. We also plotted the norm 
of the error (solid-line) and the corresponding upper bound (dashed-line) given by 
Theorem 1.6. For this small example, the exact solution was computed by the 
Matlab function 'dlyap'. 

Experiment 2.2 In the last experiment we compared the performance of the Stein 
global and Stein block algorithms for different values of n and s. The iterations 
were stopped when the relative residual norm was less than e = 10"^ for the Stein 
block Arnoldi algorithm and when the upper bound given in Theorem 1.4 was less 
than ex ||/?o||f^. In Table 14.3, we listed the obtained CPU-times (in seconds) and 
also the total number of iterations in parentheses. For the two algorithms, the 
projected Stein equations were solved every A:i = 5 iterations. 

As shown in Table 14.3, the Stein global Arnoldi algorithm is more effective 
than the Stein block Arnoldi Solver. As A is a sparse matrix, the larger CPU times 
needed for the Stein block Arnoldi solver [10] are attributed to the computational 




20 30 40 
Iterations 




20 30 40 
Iterations 



Fig. 14.1 Left The norm of the residual (solid) and the upper bound (dotted). Right The norm of 
the error (solid) and the upper bound (dashed) 
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Table 14.3 CPU-time (in 
seconds) and the total number 
of iterations (in parentheses) 



Table 14.4 CPU-time (in 
seconds) for Stein global 
Arnoldi and GMRES(IO) 





Stein global Arnoldi 


Stein block Arnoldi 


n = 22500, .s = 4 
„ = 44100,i = 3 
« = 52900, .5 = 2 
n = 62500, i = 2 


209(265) 
572(350) 
554(380) 
639(395) 




399(215) 
788(280) 
853(355) 
912(375) 




Method 




Stein global GMRES(IO) 
Arnoldi 


n = 400,A'= 1.6 X 

unknowns 
n = 90Q,N = 8.l x 

unknowns 
«= 1600,^ = 2.5^ 

unknowns 


10^ 

10' 

)x 10** 


10 
18 
31 


52 
85 
270 



expenses of the block Arnoldi algorithm and to the computation of the solution of 
the projected Stein equation of order at most ms for increasing m. In the Stein 
global Arnoldi solver, the projected problem is of order m. 

Example 3. In this experiment, we compared the performance of the Stein global 
Arnoldi method for solving (14.1) with the restarted GMRES algorithm applied to 
the rp- X rp- equivalent linear system {A(E)A — I„2)vec{X) — —vec{BB'^). We 
notice that the construction of the matrix A — A ^ A — 1„2 is very expensive for 
large problems. Therefore, using preconditioners such as the incomplete LU fac- 
torization is not possible in this case. So in the GMRES algorithm, we computed 
the matrix- vector products by using the relation w = Av 4^ W = AVA^ — V where 
w — vec{V) and w = vec{W). This formulation reduces the cost and the memory 
requirements. The n x n matrix A was the same as the one given in Example 2 and 
the n X s matrix B was chosen to be random with s = 4. The iterations were 
stopped when the relative residual norms were less than 10"**. We used different 
medium values of the dimension n : n = 400, n = 900 and n = 1600 correspond- 
ing to N = 1.6 X 10^, N = S.l X 10^ and N — 2.56 x 10* unknowns, respec- 
tively. We notice that for large values of n and due to memory limitation, it was 
impossible to run the GMRES algorithm on our computer. Here also, the projected 
Stein matrix equation (14.10) was solved every A:i = 5 iterations. In Table 14.4, 
we reported the CPU time (in seconds) obtained by the two approaches. 



14.6 Summary 



We presented in this paper a new Krylov subspace method for solving symmetric 
Stein matrix equations. Some new theoretical results such as expressions of the 
exact solution and upper bounds for the error and residual norms are given. 
The numerical tests and comparisons with other known methods show that the 
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proposed method is effective. In conclusion, global methods are competitive for 
sparse matrices and should not be used for relatively dense problems. The block 
Arnoldi algorithm is advantageous if a moderately low number of iterations 
is accompanied by a high cost of matrix vector operations with the coefficient 
matrix A. 

Acknowledgements We would like to thank the referees for helpful remarks and useful 
suggestions. 
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Chapter 15 

Optimal Control for Linear Descriptor 

Systems with Variable Coefficients 

Peter Kunkel and Volker Mehrmann 



Abstract We study optimal control problems for general linear descriptor systems 
with variable coefficients. We derive necessary and sufficient optimality conditions 
for optimal solution. We also show how to solve these optimality systems via the 
solution of generalized Riccati-differential equations, and discussed how a modi- 
fication of the cost functional leads to better solvability properties for the opti- 
mality system. 



15.1 Introduction 

We study the linear-quadratic optimal control problem to minimize the cost 
functional 

1 1 /" 

J{x, u) = -x{tfYMx{tf) +- {x'^Wx + Ix^Su + u^Ru) dt, (15.1) 
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subject to a general linear differential-algebraic equation (DAE), sometimes also 
called descriptor system, with variable coefficients of the form 

Ex = Ax + Bu+f, (15.2) 

x{tQ)^XQ. (15.3) 

In this system, x is the state vector and u is the input (control) vector. Denoting 
by M"'* the set of real n x A: matrices and by C'(I,R"'*) the /-times continuously 
differentiable functions from an interval 1= [fo,?/] to E"*, we assume for the 
coefficients in the cost functional that W e C"(I,E"'"), 5 € C°(I,R"'"'), ./? e 
C"(I,M'"'"). Furthermore, we require that W and R are pointwise symmetric and 
also that M e M"'" is symmetric. In the constraint descriptor system (15.2) we have 
coefficients E G C°(I,M"'"), A e C°(I,M"'"), B e C°(I,M"-"'), / e C"(I,M"), and 
we take xo € M". 

Linear quadratic optimal control problems for and DAEs arise in the control of 
mechanical multibody systems, Eich-Soellner [12] and Gerdts [14]; electrical 
circuits, Giinther and Feldmann [15]; chemical engineering, Diehl et al. [11] or 
heterogeneous systems, where different models are coupled together [26]. They 
usually represent local linearizations of general nonlinear control problems. 

For ordinary differential equations, the theory of optimal control problems is 
well established, see, e.g., Gabasov and Kirillova [13] and Vinter [29] and the 
references therein. For systems where the constraint is a DAE, the situation is 
much more difficult and the existing literature is more recent. Results for special 
cases such as linear constant coefficient systems or special semi-explicit systems 
were e.g. obtained in Bender and Laub [5], Cobb [9], Lin and Yang [24], 
Mehrmann [25] and Pinho and Vinter [27]. 

A major difficulty in deriving adjoint equations and optimality systems is that 
for the potential candidates of adjoint equations and optimality systems, existence 
and uniqueness of solutions and correctness of initial contions cannot be guar- 
anteed. See Refs. [1, 3, 10, 18, 23, 27, 28] for examples and a discussion of the 
difficulties. 

Due to these difficulties, the standard approach to deal with optimal control 
problems for DAEs is to first perform some transformations, including regulari- 
zation and index reduction. Such transformations exist. See Refs. [1, 22, 23]. 

There also exist some papers that derive optimality conditions for specially 
structured DAE systems directly [1, 14, 23, 27, 28]. Here, we discuss general 
unstructured linear systems with variable coefficients. We follow the so-called 
strangeness index concept, see Kunkel and Mehrmann [20], and consider the 
system in a behavior setting as a general over- or underdetermined differential- 
algebraic system. 
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15.2 Preliminaries 

In this section we will introduce some notation and recall some results on dif- 
ferential-algebraic equations and on optimization theory. 

Throughout the paper we assume that all functions are sufficiently smooth, i.e., 
sufficiently often continuously differentiable. We will make frequent use of the 
Moore-Penrose pseudoinverse of a matrix valued function A : I ^ M'", which is 
the unique matrix function A+ : I -^ M"' that satisfies the four Penrose axioms 

AA+A^A, A+AA+=A+, {AA+Y ^ AA+ , {A+Af^A+A (15.4) 

pointwise, see, e.g. Campbell and Meyer [8]. Note that if A e C*(I,]R'") and has 
constant rank on I then A+ e C*^(I,R"''). 

We will generate our optimality conditions via some results from general 
optimization theory, see, e.g., Zeidler [30]. For this consider the optimization 
problem 



subject to the constraint 



J{z) =mm\ (15.5) 

J^(z) = 0, (15.6) 



where 

JiD^M, T:n->Y, BCZopen, 
with real Banach spaces Z, Y. Let, furthermore, 

z* e M = {z e © I T{z) = 0}. 
Then we will make use of the following theorem. 

Theorem 1.1 Let J he Frechet differentiable in z*and let J- be a submersion in z*, 
i.e., let T he Frechet differentiable in a neighborhood ofz*with Frechet derivative 
DJ-{z*) : Z, —> Y surjective and kernel £)J-"(z*) continuously projectable . 

If z* is a local minimum o/(15.5), then there exists a unique A in the dual space 
Y* of Y with 

DJ{z*)Az + A(DJ-(z*)Az) = for all Az G Z. (15.7) 



The functional A in Theorem 1.1 is called the Lagrange multiplier associated 
with the constraint (15.6). 

In general we are interested in function representations of the Lagrange 
multiplier functional A. Such representations are obtained by the following 
theorem. 
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Theorem 1.2 Let Y=C°(I,R')xV with a vector space V C M' and let 
(a, y) e Y. Then 

A{g,r)^ r ).{tfg{t)dt + fr 

J to 

defines a linear form A S Y*, which conversely uniquely determines (/,, y) € Y. 

A sufficient condition that guarantees tiiat also the minimum is unique is given 
by the following theorem, which, e.g., covers linear-quadratic control problems 
with positive definite reduced Hessian. 

Theorem 1.3 Suppose that JF : Z — > Y is affine linear and that JT^ : Z -^ M is 
strictly convex on M, i.e., 

J{iz\ + (1 -a)z2)<aj(zi) + (l -y)J(zi) forallzuZi eM 
with zi 7^ Z2 for all a G (0, 1), 

then the optimization problem (15.5) subject to (15.6) has a unique minimum. 

For our analysis, we will make use of some basic results on DAE theory. We 
follow [20] in notation and style of presentation. 

When studying DAE control problems, one can distinguish two viewpoints. 
Either one takes the behavior approach and merges the variables jc, u into one 
vector z, i.e. one studies 

£z = Az+f, (15.8) 

with 

£=[EQ], A=[AB\e C°(I, M"'"+'"). 

For the underdetermined system (15.8) one then studies existence and uniqueness 
of solutions. The alternative is to keep the variables u and x separate. In this case 
one has to distinguish whether solutions exist for all controls in a given input set U 
or whether there exist controls at all for which the system is solvable, using the 
following solution concept. 

Definition 1.4 Consider system (15.2) with a given fixed input function u that is 

sufficiently smooth. A function x : I ^ K" is called a solution of (15.2) if x G 
C' (I, M") and x satisfies (15.3) pointwise. It is called a solution of the initial value 
problem (15.2)-(15.3) if x is a solution of (15.2) and satisfies (15.3). An initial 
condition (15.3) is called consistent if the corresponding initial value problem has 
at least one solution. 

In the sequel it will be necessary to slightly weaken this solution concept. Note, 
however, that under the assumption of sufficient smoothness we will always be in 
the case of Definition 1.4. 
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Definition 1.5 A control problem of the form (15.2) with a given set of controls U 
is called regular (locally with respect to a given solution {x, u) of (15.2)) if it has a 
unique solution for every sufficiently smooth input function m in a neighborhood of 
u and every initial value in a neighborhood of x(/o) that is consistent for the system 
with input function u. 

In order to analyze the properties of the system, in Kunkel and Mehrmann 
[19], Kunkel [22], hypotheses have been formulated which lead to an index 
concept, the so-called strangeness index, see Kunkel and Mehrmann [20] for a 
detailed derivation and analysis of this concept. Consider the constraint system in 
the behavior form (15.8). As in Campbell [7], we introduce a derivative array, 
which stacks the original equation and all its derivatives up to level £ in one 
large system. 



where 






Mi{t)ze=Ne{t)zf + g({t), (15.9) 



(;>""- G'.)^""' ^-"-^ 



^W for/ = 0,...,£, 7 = 0, 
otherwise, 

{ze)j = z^\ y = 0,...,£, 

fe),=/'l i = 0,...,L 

To characterize the solution set we require the following hypothesis which can be 
proved for any linear system under some constant rank assumptions, see Kunkel 
and Mehrmann [20]. 

Hypothesis 1.6 There exist integers fi,d,a, and v such that the pair {M^i,N^^ 
associated with (15.9) has the following properties: 

1. For all f G I we have rankM,,(f) = {^+ \)n — a — v. This implies the existence 
of a smooth matrix function Z of size (/i + 1)« x (a + v) and pointwise max- 
imal rank satisfying Z^M^^ = 0. 

2. For all f e I we have rank Z^A^^, [/„+,„ ... 0] = a. This implies that without 
loss of generality Z can be partitioned as Z = [Z2 Z3 ], with Z2 of size {fi+ 1) 
n X a and Z3 of size (/.( + l)n x v, such that A2 = ZjA',,[/„+,„0. . .0] has full 
row rank a and Z^N^ [4+mO. . 0] = 0. Furthermore, there exists a smooth 
matrix function T2 of size {n + m) x d,d ^ n — a, and pointwise maximal rank 
satisfying A2T2 = 0. 

3. For all f e I we have rank 8{t)T2{t) — d. This implies the existence of a smooth 
matrix function Zi of size n x d and pointwise maximal rank satisfying rank 
E\ — d with £1 = ZfE. 
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The smallest /i for which this hypothesis holds is called the strangeness-index 
of the system and system (15.8) has the same solution set as the reduced system 



(15.10) 



where M = ZM, /i = Zf/, h = Zfg, for / = 2, 3. 

If in this reduced system fj, does not vanish identically, then the system has no 
solution regardless how the input is chosen. If /3 = 0, however, then we can just 
leave off the last v equations. For this reason, in the following analysis we assume 
w.l.o.g. that V = 0. 

We can then rewrite the reduced system again in terms of the original variables 
u,x and obtain the system 



Ex = Ajc + Bu +/, x(?o) = .^0, 



(15.11) 



where 



'El 



, A = 


'm 


, B = 




, / = 


7i" 



with 



£i=Z[Zs, [Ai B^\=Z\{A B], /, =Zi7, 
U2 52] =Z[^^[/„+„0---0]^ /2 = 



:i5.i2) 



Z2g/i- 



By construction, in the reduced system (15.11), the matrix function £1 has full 
row rank d and [A2T2 B2] has full row rank a with a matrix function Tj sat- 
isfying E1T2 = and T^T'2 = la- Due to the fact that the solution set has not 
changed, one can consider the minimization of (15.1) subject to (15.11) instead of 
(15.2). Unfortunately, (15.11) still may not be solvable for all m e U = C°{I,W"). 
But, since [A2T2 B2] has full row rank, it has been shown in Kunkel et al. [22], 
that there exists a linear feedback 



u = Kx + w, 



with K e C°(I,M"''") such that in the closed loop system 

Ex= {A+ BK)x + Bw +/, Jc(/o) = xq, 



(15.13) 



(15.14) 



the matrix function (A2 + B2K)T'2 is pointwise nonsingular, implying that the DAE 
in (15.14) is regular and strangeness-free for every given w e U = C"(I, R'"). 

If we insert the feedback (15.13) in (15.11), then we obtain an optimization 
problem for the variables x, w instead of x, m, and (see Ref. [21]) these problems 
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and the solutions are directly transferable to each other. For this reason we may in 
the following assume w.l.o.g. that the differential-algebraic system (15.2) is reg- 
ular and strangeness-free as a free system without control, i.e., when u — 0. 

Under these assumptions it is then known, see, e.g., Kunkel and Mehrmann 
[20], that there exist Pe C°(I,M"'") and Qe C'(I,M"") pointwise orthogonal 
such that 



E = PEQ = 
B = PB = 
x = Qx = 



-- 


E 


1,1 0- 
0. 


, A = 


PAQ-^ 


PEQ = 


Bi 


, f = pf = 


./2. 


5 


X 
X 


1 

2. 


, -fo 


= Gio = 




.■^0,2. 


1 



^2,1 ^2,2 



;i5.i5) 



with £i,i e C{l,W'-'') and A2,2 G Ol.W''') pointwise nonsingular. To get solv- 
abihty of (15.2) for arbitrary u G C"(I,M"') and/ e C°(I,K"), in view of 

Ex ^ EE+Ex = E—iE+Ex) - E—(E+E)x, 
dr ' df ' ' 

we have to interpret (15.2) as 

E-{E+Ex) = {A+E-{E+E))x + Bu+f, (£+&)(fo) = xq, (15.16) 

which allows the larger solution space, see Kunkel and Mehrmann [17], 

X = C^+g(I,M") = {x e C°(I,]R") I E+Ex e C'(I,M")} (15.17) 

equipped with the norm 

d 



X W = X 



C 



dt 



{E+Ex)\ 



CO. 



^15.18) 



One should note that the choice of the initial value xq is restricted by the 
requirement in (15.16). 



15.3 Necessary Optimality Conditions 

In this section we derive necessary conditions for the linear quadratic optimal 
control problem to minimize (15.1) subject to (15.2) and (15.3). Following Kunkel 
and Mehrmann [17], we can use in (15.6) the constraint function 



J- : X ^ Y = C"(I,M") X range E+{to)E{to) 
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given by 

T(x) = [E—iE+Ex) - (A + E—{E+E))x ~ Bu -/, (E+Ex){to) - x,, 
\ dt dt 

Then from (15.16) we obtain 

PEQQ'j<,QQ'E+P''PEQQ'x) 

PAQ + PEQQ^j^{QQ^E+P^PEQQ'')Qj Q^x + PBu + P/, 



or equivalently 



~EQ^-{QirE~x) 



= ( A + PP'EQ' Q + EQ'-{QE'EQ' )Q\x + Bu +/. 
Using tlie product rule and cancelling equal terms on both sides we obtain 

EQ^Q—{E^Ex) = ( A + EQ^Q + E—{E'^E) + EE^EQ^q]x + Bu +/. 
Since by definition EE E — E and Q^Q + Q^Q = 0, we then obtain 

Ej^{E+Ex) =(a+ Ej^{E+E)jx + Bu +/, {E+Ex){to) = h, (15.19) 

i.e., (15.16) transforms covariantly with pointwise orthogonal P and Q. If we 
partition P and Q conformably to (15.15) as 



P = 



Z" 



Q=[T' T], 



then Z^E = 0, ET = 0, and we can write (15.19) as 



£i,i 




Ai,i Ai,2 

A2,l A2,2 





Xl 


+ 


B2_ 


u + 


■/r 


1 


'jfi(fo)' 



= 


X0,1 





Since A2.2 is pointwise nonsingular, this system is uniquely solvable for arbi- 
trary continuous functions m, /i , and/2, and for any xq.i , with solution components 
satisfying 

xi^C\\W), X2^C^{l,M") 
such that 



x^Ql^ [T'T] 



X2 
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In particular, this construction defines a solution operator of the form 

5 : U X Y ^ X, {u,f,xo)^x, U = C°(I,]R"'). (15.20) 

The Frechet derivative DT{z) of JF at z e Z = X x U is given by 

DT(z)Az = (e—(E+EAx) - (A + E—(E+E))Ax - BAu, (E+EAx)(to) 
\ dt dt 

For (g, r) G Y, the equation D!F{z) — (g, r) then takes the form 

E—{E+EAx) - (A + £'—(£'+£')) Ax - BAu = g, (£'+£'Ax)(fo) = r. 

A possible solution is given by m = and Ax = S{0,g,r), hence DT{z) is 
surjective. Moreover, the kernel is given by 

kernel(DJ^(z)) 

= {(Ax, Am) I E—{E+EAx) - (A + E—{E+E))Ax - BAu = 0, 
dt dt 

{E+EAx){to) =0} 
= {(Ax, Am) I Ax = S{Au, 0, 0), Am € U} C X x U. 

Observe that kernel (Djr(z)) is parameterized with respect to Am and that 

Viz) = r{x,u) = {S{u, 0,0), u) 
defines a projection V : Z ^ Z onto kernel (DjF(z)). Here, 

||(5(m,0,0),m)||^ = ll'S(«,0,0)||x + ||«||„, and ||5(m,0,0)||x = ||x||x, 
where x is the solution of the homogeneous problem 



E—{E+Ex) - (A + E—{E+E))x - Bu = 0, {E+Ex){to) = 0. 
dt dt 



(15.21) 



Replacing again x — Qx as in (15.15), we can write (15.21) as 

u, xi(fo) = 0, 



£i,i 




Ai,i Ai,2 

A2,I A2,2 



5, 



or equivalently 

£i,i-^i = (Ai,i -Ai,2A2JA2,i)xi + (Bi - Ai,2A^lB2)u, xi(fo) = 0, (15.22) 

X2 = -A2,2(A2,iXi +B2m). (15.23) 

The variation of the constant formula for the ODE in (15.22) yields the estimate 
||xi||po + ||ii||c» ^ci||m||u, with a constant ci, and thus ||x2||co <C2||m||i]j with a 
constant C2. Altogether, using (15.18) we then get the estimate 
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Mx = ll^llco + \\tSE+Ex)\\co = llGilIco + \\^(E+ET'xi)\\c. 



d 



leillco + \\-{E+ET')x, + (£+CT')i:i)||co <C3||m||u, 



with a constant C3. With this we have shown that V is continuous and thus 
kemel(Djr(z)) is continuously projectable. Hence, we can apply Theorem 1.2 and 
obtain the existence of a unique Lagrange multiplier A e Y* . To determine A, we 
make the ansatz 

A{g,r)^ J fgdt + y''r. (15.24) 

Using the cost function (15.1) we have 

'/ 
DJ{z)Az = x{tffMAx{tf) + / {x'^WAx + x^SAu + u^S^Ax + u^RAu) dt, 

'0 

and in a local minimum z = (x, u) we obtain that for all (Ax, Am) e X x U the 
relationship 

= x{tffMAx{tf) + f(E+EAx){to)) 
'f 
+ / {x'^WAx + x^SAu + u^S^Ax + JrAu) dt 

ta 
'f 

+ / )J (e-(E+EAx) - (A + E-(E+E))Ax - BAu] dt (15.25) 

J \ dt dt ) 

to 

has to hold. If 1 G C^+£(I,M"), then, using the fact that E = EE+E = {EE+fE, 
we have by partial integration 

'/ tf 

f l^E-(E+EAx) dt= f }^(EE+fE-{E+EAx)dt 
J dt J dt 

ta 

tf 

/ {EE+kfE—iE+EAx) dt 
J dt 



to 

'/ 



= /JeE+EAx 



tf 

'' - I -\(EE+XfE](E+EAx)dt 
to J dt^ 
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a'^EAx 



—{EE-^XfE + {EE+XfE 
dt 



(E+EAx) dt 



= A^EAx 



—(EE+aVeAx + (EE+aVeE+EAx 
dt 



dt. 



Therefore, we can rewrite (15.25) as 



fr d 

= / (x^W + u^S^ - —{EE+XfE - {EE+lfEE+E - /J A 

to 

d \ 'f 

- /JE—(E+E) \hxdt+ / {x^S + u^R - )J'B)Audt + x{tffMAx{tf) 

'0 

+ A^{tj)E{tf)Ax{tf) - f{to)E{to)Ax{to) + y'^{E+EAx){to). 

If we first choose Ax = and vary over all Am e U, then we obtain the necessary 
optimality condition 



S^x + Ru - B^A = 0. 



;i5.26) 



Varying then over all Ax e X with Ax(fo) — Ax(f/) = 0, we obtain the adjoint 
equation 



7r^^c-ir+n 



Wx + Su- E'—{EE+a) - E+EE^EE+A - A^i {E+E)E^X = 0. (15.27) 

dt dt 

Varying finally over Ax(ro) £ K" and Ax(fy) e M", respectively, yields the 
initial condition 



(£+(fo)£(fo))^)' = £^(fo)i(?o), i.e., )' = £(fo)^A(fo) 
and the end condition 

Mx(r/)+£(f/)^%) = 0, 



^15.28) 



(15.29) 



respectively. 

Observe that the condition (15.29) can only hold if Mx{tf) € cokernel £■(?/). 
This extra requirement for the cost term involving the final state was observed 
already for constant coefficient systems in Mehrmann [25] and Kurina and Marz 
[23]. If this condition on M holds, then from (15.29) we obtain that 

!(//) = -E+{tffMx{tf). 
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Using the identity 

EE+EE+E + E—iE+E) = EE+{EE+E + E—(E+E)) 
dt dt 

= EE+—{EE+E) = EE+E, 

dr ' 

we obtain the initial value problem for the adjoint equation in the form 

E^—iEE^)) = Wx + 5m - (A + EE-^Efk, 

df- ' ^ ' ' (15.30) 

{EE+l){tf) = -E+{tffMx{tf). 

As we had to interpret (15.2) in the form (15.16) for the correct choice of the 
spaces, (15.30) is the correct interpretation of the problem 

^(£^1) = Wx + Su-A^l, X{tf) = -E+{tffMx{tf). (15.31) 

Note again that these re-interpretations are not crucial when the coefficient 
functions are sufficiently smooth. 

For the adjoint equation and the optimality condition, we will now study the 
action of the special equivalence transformations of (15.15). Using that {EE^) = 
EE^, we obtain for (15.30) the transformed system 

QE^P^P—{P^PEQQ^E+P'^Pa) 

= Q^WQQ^x + Q^Su ~ {Q^A^P^ + Q^ EP^ PEQQ^ E+ P'^)PL 

Setting 

W^Q^WQ, ~S=Q''S, l = PX, M^QitfYMQ{tf), 
we obtain 

EP—{P'^EE"l) ^ Wx + 5m 
dt 

- (a^ + Q^Q¥ + Q^{Q¥p + QE^P + QE^P)P'^EE^\x 
or equivalently 

EPP^EE^l + E—lEE"!) ^W~x+~Su 
dt 

- {a^ + (fQE^ + E^PP'^EE" + E^EE* + Q^QE^EE"\l. 

Using the orthogonality of P, 2, which implies that ffQ + QQ — and P^P + 
PP == 0, we obtain 

E—iEE"!) = Wl+~Su~{A + EE^eVI. 
dt 
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For the initial condition we obtain accordingly 

(££+l)(fy) = iPEQQ^E+P^Pl)itf) = {PEE+X){tf) 

= ^P{tf)E+{tffQitf)Q{tjYMQ{tf)Q{tffx{tf) 

^ -E"{tfYm{tf). 

Thus, we have shown that (15.30) transforms covariantly and that we may consider 
(15.30) in the condensed form associated with (15.15). Setting (with comfortable 
partitioning) 



/l = 



W = 



W2^ 



Wi,2 

W22 



M = 



Mi,i 

M2.1 



Ml. 2 

M2,2 



, (15.32) 



we obtain the system 



£f i^i = Wi,ixi + W12X2 + Siu - (Ai,i 



£1,1) M 



-A^il2, 



M{tf) = -Ej{tf)iMuiXiitf) + Mi2X2{tf)), 



= W2,ixi + W2.2X2 + S2U - Afjli 



- A2 2^2- 



We immediately see that as a differential-algebraic equation in 1 this system is 
strangeness-free, and, since A2,2 is pointwise nonsingular, this system yields a 
unique solution ?. e €^£^(1, R") for every (x, m) £ Z. 

If (x, m) G Z is a local minimum, then from (15.29) and (15.30) we can 
determine Lagrange multipliers a £ Cl-^^{1,W) and y G cokernel £■(//■). It has 
been shown in Kunkel and Mehrmann [21] that this A also satisfies the optimality 
condition (15.26). 

It thus follows that the functional that is defined via (15.24), (15.30) and y = 
E{to) l{to) as in (15.28) has the property (15.7) and is, therefore, the desired 
Lagrange multiplier. Furthermore, it is then clear that (z, A) — (x, u, X) is a local 
minimum of the unconstrained optimization problem 

J{z,l) = J{z) + A{Hz)) 

If 
1 1 /" 

= -x{tfYMx{tf) +- {x'^Wx + Ix^Su + u^Ru) dt 

'0 
'I 

+ /J (e(—(E+Ex) ~(A + E—{E+E))x - Bu -/ ) dt 
J \ dt dt J 



+ ]/((£+&)(fo)-xo)=min! 



^15.33) 
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We can summarize our analysis in tlie following theorem. 

Theorem 1.7 Consider the optimal control problem (15.1) subject to (15.2) with a 
consistent initial condition. Suppose that (15.2) is strangeness-free as a behavior 
system and that Mx{tf) G cokernel £(//■). 

If{x, m) e X X U M a solution to this optimal control problem, then there exists 
a Lagrange multiplier function 2 € C^+£(I, M"), such that (x, 1, m) satisfy the 
optimality boundary value problem 



(a) Ef^{E+Ex) = (A + E^{E+E))x + Bu +/, (£+&)(fo) = x,,, 

(b) E^f^{EE+l) ^Wx + Su-{A+ EE+Ef ?., 
{EE+}.){tf)^-E+{tffMx{tf), 

(c) O^S^x + Ru- B^X. 



;i5.34) 



15.4 The Strangeness Index of the OptimaUty System 

An important question for the numerical computation of optimal controls is 
when the optimality system (15.34) is regular and strangeness-free and whether 
the strangeness index of (15.34) is related to the strangeness index of the 
original system. For other index concepts like the tractability index this 
question has been discussed in Balla et al. [2], Balla and Marz [4] and Kurina 
and Miirz [23]. 

Theorem 1.8 The DAE in (15.34) is regular and strangeness-free if and only if 

A2.1 B2' 
R= Al, W2,2 S2 (15.35) 

B2 S2 R 

is pointwise nonsingular, where we used the notation 0/(15.15). 

Proof Consider the reduced system (15.11) associated with the DAE (15.2) and 
derive the boundary value problem (15.34) from this reduced system. If we carry 
out the change of basis with orthogonal transformations leading to the normal form 
(15.15), then we obtain the transformed boundary value problem 

(a) £i,iii =Ai,ixi +Ai,2X2 +5iM+/i, xi(ro) = xo,i 

(b) = A2,iXi + A2,2JC2 + B2U +/2, 



(c) £[iii == Wi.ixi + W^,2X2 + Siu - (Aij +£i,i)^Ai - A[il2, 
li(f/) = -£i.i(i'/)"^Mi,iXi(f/), 

(d) = W2,lX^ + W2,2X2 + S2U - AfjAi - AI2X2, 

(e) Q^S\xi+Slx2+Ru~B'\).i~BlX2. 



^15.36) 
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We can rewrite (15.36) in a symmetrized way as 






£l.l 











-Eh 




























































-;.2 

X2 
U 



~ 





Ai,i 





Ai,2 


Bl 


(A, 


i+Eia) 


"■ w,., 


'^2,1 


"^l. 


5i 







Az.i 





A2,2 


B2 




AU 


W2.1 


Ai, 


W22 


S2 




B] 


sj 


Bl 


Si 


R 



\-h] 




r/i] 


Xl 







-h 


+ 


fi 


X2 







[_ u 








(15.37) 



Obviously this DAE is regular and strangeness-free if and only if the symmetric 
matrix function R is pointwise nonsingular. D 

If (15.2) with M = is regular and strangeness-free, then A22 is pointwise 
nonsingular. In our analysis we have shown that this property can always be 
achieved, but note that we do not need that A2,2 is pointwise nonsingular to obtain 
a regular and strangeness-free optimality system (15.34). 

On the other hand for R to be pointwise nonsingular, it is clearly necessary that 
[A2.2 ^2] has pointwise full row rank. This condition is equivalent to the condition 
that the behavior system (15.8) belonging to the reduced problem satisfies Hypothesis 
1.6 with fi — and v = 0, see [22] for a detailed discussion of this issue and also for 
an extension of these results to the case of control systems with output equations. 

Example 1.9 An example of a control problem of the form (15.2) that is not 
directly strangeness-free in the behavior setting is discussed in Backes [1 p. 50]. 
This linear-quadratic control problem has the coefficients 



"1 0' 




1 


, A = 


0_ 






0-1 
1 





"1" 




"0" 


, B = 


1 


, / = 
















"1 0" 




"0 0' 




"0' 





, w = 





, 5 = 





0_ 




1 




0_ 



M 



and the initial condition x\ (0) 
is given by 



/?-l, 



a, X2(0) = 0. A possible reduced system (15.11) 



B^B. 



"1 





0" 
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Observe that the corresponding free system of this reduced problem (i.e. with 
M = 0) itself is regular and strangeness-free. It follows that the adjoint equation 
and the optimality condition are given by 



'1 0" 




M 




"0 0" 




Xl 









^2 


= 







X2 


- 







>3. 




1_ 




.-^3. 





1 1 0] 









0" 




M 








1 




h 





-1 


0_ 




.^3. 



h 

A3 



respectively, with the end condition /li(?/) = —xi{tf). 

We obtain that the matrix function R in (15.35) given by 



R 



' 








-1 


0" 








1 











1 








1 


-1 








1 











1 





1 



is pointwise nonsingular, and hence the boundary value problem (15.34) is regular 
and strangeness-free. Moreover, it has a unique solution which is given by 



xi = a(l 



-), X2^h = 0, X3 = M = ~i2 = 



M = 



2a. 



Example 1.10 In Kurina and Marz [23] the optimal control problem to minimize 



J{x, u) 



{xx{tf + u(tf) dt 



subject to 



d 
dt 



t' 
1 




Xl 
Xo 


J 


Q r 




Xl 


+ 


T 





M, X2(0) =X2,0 



is discussed. Obviously, xj does not enter the DAE and therefore rather plays the 
role of a control than of a state. Consequently, the corresponding free system is not 
regular. Rewriting the system as 



t' 
1 




Xl 


= 


T 





X2(0) =X2,0, 
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'0 r 








= 


0' 
1 



and analyzing this system in our described framework, we first of all observe that 
this system possesses a strangeness index and that it is even regular and strange- 
ness-free as a behavior system. A possible reduced system (15.11) is given by 



X2(0) = X2,0. 



The corresponding free system is not regular although it is strangeness-free. 
Moreover, we can read off 



R 



which is obviously pointwise nonsingular. Hence, the boundary value problem 
(15.34) is regular and strangeness-free. 

15.5 Sufficient Conditions and the Formal Optimality 
System 

One may be tempted to drop the assumptions of Theorem 1.7 and to consider 
directly the formal optimality boundary value problem given by 



"0 





r 





1 





1 





1 



(a) Ex^ Ax + Bu +/, x(fo) = 

(b) j,{E^A.) ^Wx + Su~ A^l, 

(c) = 5^x + Ru - 5^1. 



xo 



{E^X)[tf) = -Mxitf), 



(15.38) 



But it was already observed in Backes [1], Kurina and Marz [23] and Mehrmann 
[25] that it is in general not correct to just consider this system. First of all, as we 
have shown, the cost matrix M for the final state has to be in the correct 
cokernel, since otherwise the initial value problem may not be solvable due to a 
wrong number of conditions. An example for this is given in Backes [1], Kurina 
and Marz [23]. A further difficulty arises from the fact that the formal adjoint 
equation (15.38b) may not be strangeness-free in the variable 1 and thus extra 
differentiability conditions may arise which may not be satisfied, see the fol- 
lowing example. 

Example 1.11 Consider the problem 



1 / 1 7 

J{x, u) = - I {xi (f) + u{t) )dt = min ! 



subject to the differential-algebraic system 



r 




Xl 


= 


'1 0" 
1 


Xl 

.■^2. 


+ 


r 




M + 


7i' 
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The reduced system (15.11) in this case is the purely algebraic equation 




1 
1 



/l +/2 



The associated adjoint equation (15.30) is then 




'1 0" 



Xl 

X2 


- 


1 0' 

1 







and no initial conditions are necessary. The optimality condition (15.26) is given by 

= M — ii. 



A simple calculation yields the optimal solution 

1, 



X] 



h 



-M +.f2. 



X2 



-fl, h - 0. 



If, however, we consider the formal adjoint equation (15.38b) given by 



li(l)=0 



together with the optimality condition (15.38c), then we obtain that 



0" 

1 


.^2. 


= 


1 0' 





Xl 

.■^2. 


- 


'1 0' 
1 


^2 



Xl = M = il = --{fl +fi 



Xl = -/2, h - ~\ifl +fl) 



without using the initial condition /li(l) =0. Depending on the data, this initial 
condition may be consistent or not. In view of the correct solution it is obvious that 
this initial condition should not be present. But this cannot be seen from (15.38). 
Moreover, the determination of /I2 requires more smoothness of the inhomogeneity 
than in (15.34). 

As we have demonstrated by Example 1.11, difficulties may arise by working 
with the formal adjoint equations. In particular, they may not be solvable due to 
additional initial conditions or due to lack of smoothness. If, however, the cost 
functional is positive semidefinite, then one can show that any solution of the 
formal optimality system yields a minimum and thus constitutes a sufficient 
condition. This was, e.g., shown for ODE optimal control in Campbell [6], for 
linear constant coefficient DAEs in Mehrmann [25], and in a specific setting for 
linear DAEs with variable coefficients in Backes [1]. The general result is given by 
the following theorem. 

Theorem 1.12 Consider the optimal control problem (15.1) subject to (15.2) with 
a consistent initial condition and suppose that in the cost functional (15.1) we have 
that 

5^ /? '^ 
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are (pointwise) positive semidefinite. If {x* , m* , /,) satisfies the formal optimality 
system (15.38) then for any (x, m) satisfying (15.2) we have 

J{x,u)>J{x*,u*). 

Proof We consider the function 

$(s) = J{{1 - s)x* + sx, (1 - s)u* + su) 

and show that ^{s) has a minimum at .? = 0. We have 



^{s)=\l \{l-sY 



w s 

S^ R 



+ 2s(l-s) 

-iT r 



X 

u 

w s 

S^ R 



w s 

S^ R 

X 

u 



dt 



(1 - s)x*'Mx* + 25(1 - s)x*'Mx + s^x'Mx 



and 



ds 



(D(O) 



w s 

S^ R 



T r 



W S 

S^ R 



dt 



(x*\ 



Mx-x*'^Mx* 



i=tf 



If we consider (15.38b) for (x*,m*) and multiply from the left by x*^, then we 
obtain 

-x*^E^l - x*^E^A + x*'^Wx* + x^^Su* - x*'^£'X = 0. 
Inserting the transpose of (15.38a) yields 



-X*' E' X - X*' E' X + X*' Wx* + X*' Su* - x*' E' A + u*' B' 1+f 1 = 0. 
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Finally, inserting (15.38c) gives 
d 



dt 



{x*^E^X) +x*^Wx* + 2x*^Su* + u*'^Ru* +fX 



d 
dt 



'x*^E^A) 



'x*' 


T 


'W S' 







+fl = 0. 



Analogously, for (x, u) we obtain the equation 

1 T 



dt 



Thus, we obtain that 



(x^E^X) + 



W S 
S^ R 



+fA = 0. 



ds 



(D(O) 



(^(x^£^;.) -^(x*^£^i)) dt+[x*^M{x~x^ 



( 



{x-x*fE^A 



)|'' + ((x-x*)^Mx*) 



= 0, 



'='/ 



since x(fo) — x*(fo) and [E'^ X){tf) = ~Mx*{tf). Due to the positive semidefinite- 
ness of the cost functional we have 



ds'^ 



$(0) 



X — X 
u — u 



* n 7" 



w s 

S^ R 



X — X' 

u — u* 



dt 



+ Ux - x*Y M{x - x*)\ 



>0 



'='/ 



and this implies that O has a minimum at s = 0, which may, however, not be 
unique. D 

We can summarize the results of this section as follows. The necessary opti- 
mality condition for the optimal control problem (15.1) subject to (15.2) is given 
by (15.34) and not by the formal optimality system (15.38). If, however, (15.38) 
has a solution, then it corresponds to a minimum of the optimal control problem. If 
no index reduction is performed, then a necessary condition for the DAE in (15.1) 
to be regular and strangeness-free is that the DAE (15.2) itself is regular and 
strangeness-free as a behavior system. 



15.6 Differential-Algebraic Riccati Equations 



One of the classical approaches to solve boundary value problems arising in the 
linear-quadratic optimal control problem of ordinary differential equations is 
the use of Riccati differential equations. This approach has also been studied in the 



15 Optimal Control for Linear Descriptor Systems with Variable Coefficients 



333 



case of differential-algebraic equations, See Refs. [5, 18, 25], and it has been 
observed in Kunkel and Mehrmann [18] that the Riccati approach is not always 
possible. If, however, some further conditions hold, then the Riccati approach can 
be carried out. 

Let us first consider the optimality boundary value problem (15.34) in its 
symmetrized normal form (15.37). If R is pointwise nonsingular, then 



-/.2 
U 



-R 






A2,l" 




'-h 




'fl 


^ll 


W2.1 




+ 





B\ 


Si . 




-fl 








;i5.39) 



The remaining equations can be written as 

£i,iii 







Ai/ 




■-Ai" 




A[,i Wi,, 




. Xl _ 




" Ai,2 Bi" 


^ 




_4i ^I 


1 


Si 



-A2 
X2 

u 



^15.40) 



Inserting (15.39) and defining 

a. Fl =£rl(Ai,i - [0Ai,2Bi]R-'[Al^WliSif 
h.Gi^ElliO Ai,2 Bi]^-'[0 Ai,2 BifE^J 
c.Hi=Wu~[Ali Wl, Si]R-'[Al^ Wl, Sif , 

d. gi^El\(fi-[0 Ai,2 Bi]R-'[fT 0]^}), 

e. /I, =-Ki Wl, Si]R-'\fT 0]^ 

we obtain the boundary value problem with Hamiltonian structure given by 



(a) xi=FiXx + Gi{E\yXx)+gi, xi(/o)=xo,i, 



(15.41) 



(b) |(£[,iii) = H,x, - F\{El,Xi) + hi, (£f,ili)(?/) = -Mia.Ti(f/). 
Making the ansatz 

fify.i =Zi,iXi+vi, (15.42) 

and using its derivative 



dt 



(£[j/.i) =XijXx +Xi,ii-i 



v\, 



the Hamiltonian boundary value problem (15.41) yields 

^1,1-^1 +Xi,i(FiXi + Gi{Xi,yXi + vi) +gi) + vi =: Hi.vi - /^[(Xi,ixi + vi) +/!i, 
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hi)=0. 



i^i.i +^1,1^1 +^1^1,1 +^i,iGiZi.i 
+ (vi +Xi,iGiVi +fJvi +Xi,igi 

Thus, we can solve the two initial value problems 

^1,1 +Xi,iFi +FfXia +Xi,iGiXi,i -i/i =0, Zia(fy) = -Mi,i, 



and 



vi 



-Xi,iGiVi+F'^vi+Xi,igi-hi =0, vi(f/)=0, 



(15.43) 



;i5.44) 



to obtain X^ j and vi and to decouple the solution to (15.41). 

In this way we have obtained a Riccati approach for the dynamic part of the 
system. Ideally, however, we would like to have a Riccati approach directly for the 
boundary value problem associated with (15.19), (15.30), and (15.26) in the ori- 
ginal data, without carrying out the change of bases and going to normal form. If we 
make a similar ansatz for the general situation, i.e., 1 — Xx + v, then we face the 
problem that neither the whole x nor the whole A may be dififerentiable. To acco- 
modate for the appropriate solution spaces, we therefore make the modified ansatz 



(a) A == XEx + V = XEE+Ex + v, 

(b) ^,{EE+X) = f^{EE+X)Ex + {EE+X)EE+Ex 

+{EE+X)Ef^{E+Ex) +f^{E+Ev), 



where 



XeC^^.^K"-"), ve4£,(I,R"). 



(15.45) 



(15.46) 



In this way we have obtained an ansatz that fits to the solution spaces for x and 1. 
The disadvantage of this approach, however, is that X{I — ££+) now can be 
chosen arbitrarily. Using again the transformation to normal form (15.15) and that 



PI = PXP^PEQQ^x + Pv, 



we obtain 



with 



/l = 



X = PXP' = 



= XE~x 



Xi.i 



X\2 

^2,2 



Comparing with (15.42) we obtain 



X, 



^1.1^1,1^1.1' Vi 



^,1^1- 
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In particular, we obtain that Xi i and vi are continuously differentiable under 
the assumption that (15.43) is solvable on the interval I. Furthermore, Zi i is 
pointwise symmetric. From (15.39) we then obtain 



with 



and 



I2 = [/ 0]^" 






A2.1' 




4[,2 


^^2,1 




B\ 


s\ . 





'[Xx^iEx.iXi +vi) 
x\ 



■X2^iEiAXi + V2, 



^2,i£i,i = [/ 0]^- 






M,i' 




<2 


W2.1 




B\ 


s\ . 





/ 



V2 = [/ 0]^" 



~B\vi 



If we assume that R itself is pointwise nonsingular (which corresponds to the 
assumption that all controls are weighted in the cost functional), then from (15.26) 
we obtain that 

u = Rr\B^ I - S^ x) 



and thus from (15.30) and (15.21) we obtain 

E^—{EE^X)Ex + E'^{EE+X)EE+Ex + E^—{EE+v) + E'^{EE+X) 

Ax + E—(E+E)x + BR-^B^{XEx + v) - BR'^S^x +f 
dt 

Wx + SR'^B^{XEx + v) - SR'^S^x - (A + EE+Ef{XEx + v), 



(e'^—{EE+X)E + E^{EE+X)EE+E + E'^{EE+X)E—{E+E) 
V dt dt 

+ E^{EE+X)EE+E + E^XA + E^XBR'^B^XE + A^XE 
- E^XBR-^S^ - SR'^B^XE + SR'^S^ -w)x 
'd , 



{—{EE+v) + E^XBR-^B^v + E'^Xf - SR'^B'^v+A^v + E^{EE+v)\ = 0. 



336 P. Kunkel and V. Mehrmann 

Introducing the notation 

(a) F = A~BR-^S^, 

(b) G^BR^B^, 

(c) H=W-SR-^S'^, (15.47) 

we obtain 

—{E^{EE+X)E{E+E)) + E'^XF + F'^XE + E'^XGXE - h\x 

which yields the two initial value problems 
—{E'^XE) + E^XF + F'^XE + E^XGXE -H = Q, {E^XE){tf) = -M, (15.48) 

and 

-(£^v) + E^XGv + F'^v + E^Xf = 0, [E^v){tf) = 0. (15.49) 

Note that we must have M = E{tf)^ME{tf) with suitable M and H = E^HE 
with suitable H as necessary condition for the solvability of (15.48). Note also 
that (as already in the case of ODEs) the optimality boundary value problem 
(15.34) may be solvable, whereas (15.48) does not allow for a solution on the 
whole interval I. 

The analysis in this section shows that we can obtain a Riccati approach if 
the system (15.2) is strangeness-free in the behavior setting and if R is 
invertible. 



15.7 A Modified Cost Functional 

In the previous sections we have derived necessary conditions for linear-quadratic 

control problem and studied how these can be solved. In particular, we have seen 

that extra conditions on the cost functional have to hold for the optimality system 

or the associated Riccati equation to have a solution. 

Since the cost functional is often a matter of choice one could modify it to 

reduce the requirements. A simple modification is the following cost functional, 

See Ref. [16] in the case of constant coefficients, 

'f 
1 If 

J{x, u) = -x{tffMx{tf) +- I {x^Wx + 2x^Su + u^Ru) dt, (15.50) 

to 
with M = E{tffME{tf), W = E'^WE, and ~S ^ E'^S. 
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Assuming again that the original system (15.2) is strangeness-free as a behavior 
system, the same analysis as before leads to the modified optimality boundary 
value problem 

(a) Ef,{E+Ex) = (A + Ei{E+E))x + Bu +/, {E+Ex){tQ) = xq, 

(b) E'^f^{EE+X) = E^WEx + E'^Su - (A + EE+E)X, 
{EE+l){tf) = ~E+{tffE{tjYME{tf)x{tf), 

(c) Q = S^Ex + Ru~B^A. (15.51) 

Considering the conditions that guarantee that the optimality system is again 
strangeness-free, we obtain the following corollary. 

Corollary 1.13 Consider the optimal control problem to minimize (15.50) subject 
to (15.2) and assume that (15.2) is strangeness-free as a free system (with u = 0). 
Then the optimality system (15.51) is strangeness-free if and only ifR is pointwise 
nonsingular. 

Proof Consider the system (15.2) in the normal form (15.15). By assumption, we 
have that A2,2 is invertible and in the transformed cost functional (15.32) we obtain 
that 52 = and W2 2 = 0. The modified matrix R then takes the form 



R 



which is clearly pointwise nonsingular if and only if R is pointwise nonsingular. 

D 

The Riccati approach also changes when we use the modified cost functional. 
In particular, we obtain 

(a) F=A-BR-^¥, 

(b) G = G = BR^B^, 

(c) H = W - 'SR-^'S"^ , (15.52) 

In this case one obtains the two initial value problems 

jiE'^XE) + E^XF + F^XE + E^XGXE -H = 0, {E^XE) (tf) = -M, 






A2.2 


Bl 


A12 








Bl 





R 



and 

dt 



(15.53) 
(£^v) + E'^XGv + F'^v + E'^Xf = 0, iE^v){tf) = 0. (15.54) 



Observe that the necessary conditions for solvability as stated in the end of Sect. 
15.6 for (15.48) are now trivially fulfilled. 
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15.8 Conclusions 

We have presented the optimal control theory for general unstructured linear 
systems of differential-algebraic equations with variable coefficients. We have 
derived necessary and sufficient conditions as well as a Riccati approach. We have 
also shown how the cost function may be modified to guarantee that the optimality 
system is regular and strangeness-free. 

The presented results can be generalized to nonlinear control problems and 
suitable numerical methods for the solution of the optimality system are presented 
in Kunkel and Mehrmann [21]. 
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Chapter 16 

Robust Pole Assignment for Ordinary 

and Descriptor Systems via the Schur 

Form 

Tiexiang Li, Eric King-wah Chu and Wen- Wei Lin 



Abstract In Chu (Syst Control Lett 56:303-314, 2007), the pole assignment 
problem was considered for the control system x = Ax + Bu with linear state- 
feedback u = Fx. An algorithm using the Schur form has been proposed, pro- 
ducing good suboptimal solutions which can be refined further using optimization. 
In this paper, the algorithm is improved, incorporating the minimization of the 
feedback gain ||F||. It is also extended for the pole assignment of the descriptor 
system Ex — Ax + Bu with linear state- and derivative-feedback u — Fx — Gx. 
Newton refinement for the solutions is discussed and several illustrative numerical 
examples are presented. 



16.1 Introduction 

Let (A,B) denote the ordinary system 

x = Ax + Bu (16.1) 
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with the open-loop system matrix A e M""^" and input matrix B e W'^'"{n > m). 
The state-feedback pole assignment problem (SFPAP) seeks a control matrix F € 
K"^" such that the closed-loop system matrix Ac =A + BF has prescribed 
eigenvalues or poles. Equivalently, we are seeking a control matrix F such that 

{A+BF)X = XK (16.2) 

for some given A with desirable poles and nonsingular matrix X. Notice that A 
does not have to be in Jordan form, and X can be well-conditioned even with 
defective multiple eigenvalues in some well-chosen A. 

The SFPAP is solvable for arbitrary closed-loop spectrum when (A,B) is 
controllable, i.e., when [si ~ A,B] (^s e C) or [B,AB, . . .,A""'B] are full ranked. 
The problem has been thoroughly investigated; see [6, 7, 17], the references 
therein, or any standard textbook in control theory. It is well known that the single- 
input case (m = 1 ) has a unique solution, while the multi-input case has some 
degrees of freedom left in the problem. A notable effort in utilizing these degrees 
of freedom sensibly was made by Kautsky et al. [13], with the conditioning of the 
closed-loop spectrum (including ||X^' ||^ where X contains the normalized closed- 
loop eigenvectors) being optimized. 

When solving a pole assignment problem with a particular robustness measure 
optimized, we call it a robust pole assignment problem (RPAP). It is important to 
realize that there are many RPAPs, with different robustness measures. In this 
paper, a weighted sum of the departure from normality and the feedback gain is 
used as the robustness measure. For other possibilities and a comparison of dif- 
ferent robustness measures, see [6, 7]. 

In [7], (A,X) in Schur form is chosen together with the upper triangular part of 
A (the departure from normality) minimized. The resulting non-iterative algorithm 
SCHUR produces a suboptimal solution F for any given x\ (the first Schur vector in 
X). In this paper, this original SCHUR algorithm is improved, minimizing a 
weighted sum of the departure from normality and the feedback gain. The SCHUR 
algorithm is also extended for the pole assignment of the descriptor system 

Ex^Ax + Bu (16.3) 

with linear state- and derivative-feedback u = Fx — Gx; for the solvability and 
other algorithms for the problem, see [5, 8, 12, 18]. Furthermore, a Newton 
refinement procedure is implemented, using the suboptimal but feasible solution 
produced by SCHUR as starting point. Note that a common problem with Newton's 
method is the lack of a feasible starting point which is close enough to the (locally) 
optimal solution. This SCHUR-NEWTON algorithm will produce a better solution, 
utilizing the freedom in xi . 

The main contributions of this paper are as follows. For the RPAP, it is not 
appropriate to compare different algorithms built on different robustness measures. 
An algorithm is only "better" for a particular measure, which is unlikely to be the 
sole consideration the control design. However, our algorithm is well-conditioned 
in the sense that the Schur form, which is well-conditioned or differentiable even 
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for multiple eigenvalues, is utilized. In addition, the robustness measure is a 
flexible weighed sum of departure from normality and feedback gains. 

After the introduction here, Sect. 16.2 quotes the theorems on the departure from 
normality measures for ordinary and descriptor systems. Sections 16.3 and 16.4present 
the SCHUR-NEWTON algorithms for ordinary and descriptor systems. Section 16.5 
contains some numerical examples and Sect. 16.6 some concluding remarks. The 
(generalized) spectrum is denoted by (t{-) and (A),-, is the (ij) entry in A. 



16.2 Departure from Normality 

Consider the Schur decomposition A^ = A + BF = XAX^ with A = D + A^ where A^ 
is the off-diagonal part of A. The departure of normality measure A,, (A) = \\N\\^, was 
first considered by Henrici [11] and the following perturbation result was produced: 

Theorem 2.1 (Henrici Theorem [11]) Let A, SA G C'""", 5A ^ 0, ^i e (7(A + 5A) 

and let \\ ■ \\^, be any norm stronger than the spectral norm {with \\M\\2 < ||^||„ 
for all M). Then 

min 11-^1 <^||M||,„ n^^j§- 
/ecT(A) g{r]) ||M||„ 

where g{t])is the only positive root of g + g^ + ■ ■ ■ + g'^ =^ i]{r] > 0). 

Other related perturbation results involving the departure from normality 
measure Av(A) can be found in [1^, 9, 15]. 

For generalized eigenvalue problems [16], we have this generalization [1, 9, 
11, 15]: 

Definition 2.1 Let {A,E} define a regular matrix pencil and ^{a,e} be the set of 
all pairs of transformations {Z, U} which satisfy the following conditions: 

1. Z,U ^ C"^",Z is nonsingular and U is unitary; 

2. Z^^AU and Z~^EU are both upper triangular; and 

3. |(Z-iAt/),p + |(Z-'£t/),p = l(/=l,...,«). 

Let (Z, U) £ ''^{A,E} ^rid diag{A) e C" denote the diagonal matrix sharing the 
diagonal of A. Denote 

fi{Z, U) = \\ [Z-^AU - diag{Z-'^AU),Z-^EU - diag(Z" '£■(/)] II2 

and 

Then A2(A,£') is called the departure from normality measure of {A,E}. 



344 T. Li et al. 

Definition 2.2 Let (t(A, E) — {(a,, /?,)} denote the spectrum of the pencil olA — jSZs 
and let (a, /)) G (t{A,E). The spectral variation of {A,E) from (AjE) equals 

■5(A,£)(A,£) = max{i(,,./j)}, i(«,/i) = min{|a/?,. - /ia,]} 

Theorem 2.2 (Henrici Theorem [16]) Let {A,E} and {A,E} define regular 
pencils of the same dimension, A2(A,£) is the departure from normality measure 
o/{A,Z?}, and suppose A2(A,£) / 0. Let W = {A.E) and W — (A,£), then 






s^A,E)iA,E)<-^[l+A2{A,E)]d2{W,W) 



where d2{W, W) = \\ sin0(W, VK)||2 denotes the distance between W and W, 

A2(A,£) 
'^ [l+A2(A,£)]rf2(W,M/)' 

and g{ii) is the unique nonnegative root of g + g^ + ■ ■ ■ + g" = rjlrj > 0). 

Based on Theorems 2.1 and 2.2, we shall minimize the departure from nor- 
mality (as part of the robustness measure) of the closed-loop matrix pencil in the 
effort to control the robustness of the closed-loop spectrum or system. 

16.3 Ordinary System 

Similar to the development in [7] we have 

{A+BF)X = XA (16.4) 

assuming without loss of generality that the feedback matrix B has full rank and 
possesses the QR decomposition 

B = [QuQ2][Rl,0\^= Q[Rl,0\^= QiRb- (16.5) 

Pre-multiplying the eigenvalue equation (16.4) by B' ~ Rb^QJ and gj, we obtain 

Ql{AX~XK)=Q, 

1 -T- 1 (16.6) 

F = R-B'Ql{XhX-' -A). 

For a given A, we can select X from 

[In (GjA) - A^ Ql] v{X) = 

where ® denote the Kronecker product [9] and v{X) stacks the columns of X 
[which is different from Stk() defined later]. For the selected X, the solution to the 
SFPAP can then be obtained using (16.6). 
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16.3.1 Real Eigenvalues 

First consider tlie real case wlien all the eigenvalues in A are real; i.e., let the real 
X = [xi, . . .,x„] and A = D + N where D = diagjli ,.. .,a„},N is the strict upper 
triangular part of A with columns [i-]J, 0^] and ly G M'"' (/' = 1, . . ., m). Equation 
(16.4) and the properties of Schur decompositions imply 



Xl^ 



nj 



= 0, y = l, 



Let Sj = [Sj^ , SJ2] be a unitary basis of the above subspace, so that 



(16.7) 



(16.8) 



The Schur vectors Xj can then be select from the subspaces defined in (16.8) with 
\\r]j\\ (and thus ||A'^||f) minimized using the GSVD [3, 4, 7]. When y = 1, t]^ is 
degenerate and there will be no minimization to control xi, making it a free 
parameter. To overcome this freedom, we can optimize some additional robustness 
measure. The feedback gain ||7^||, which is of some importance in many engi- 
neering applications, is then a natural choice. Making use of the orthogonality of X 
in (16.6), we have 



F = bi, . . .,y„] =FX = b\xA-AX), \\Y\\ = \\F\\ 



and 



with 



jj = b\XjI„ - A.X^j] 



^i'^y 



SjiUj 



Sj3 = B^ [l/„ - A , X^j] Sj , X_; = [xi , . . . , x;_ 1 ] . 

Incorporating the departure from normality and feedback gain into one weighted 
robustness measure r{X,N) = (o^\\F\\f- + ojl\\N\\f, we have 



<X,N) = ^ «7 {(olSpj, + ojlSpj2)uj. 



i=i 



Other more sophisticated scaling schemes can be achieved in a similar fashion. For 
example, different ly and yj can have different weights, or (aJi,(U2) can be set to 
(1,0) (thus ignoring ||A^||). 

Fory — !,...,«, the vectors Xj and (7 can then be chosen by minimizing r{X, N) 
while scaling Xj to be a unit vector, by considering the GSVD [3, 4] of (5)i, Sj4), 
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with Sj4 = [coiS 2,(£>2Sn] ■ Finally, a more direct and efficient way to construct 
Sjii{k — 1,2,3) will be to consider, directly from (16.4) and similar to (16.7), 

Xi 



A - Ajl„ B -X_j 
Xlj 







so that the columns of [5l, 51, SV\ form a (unitary) basis for the above subspace, 
with the feedback matrix retrievable from F = YX^ . We are then required to 



choose Uj such that 



mm — \ ' , Sy — 

Uj' SljSijUj 



0JlSj2 
0}2Sj3 



From the initial point obtained via the above SCHUR algorithm, we use New- 
ton's algorithm to optimize the problem. Note that the freedom in the order of the 
poles remains to be utilized. 

Optimization Problem 1 

• 2|ivi|2 , 2||,,||2 , ( AX + BY-X{D + N)=0 

where N is n x n strictly upper triangular and X is n x n orthogonal. 
Optimization problem 1 is equivalent to: 

minffl^v(y)^v(y) + ffl^Stk(A?)^Stk(A') 

(/ A)v{X) + (/ ® B)v{Y) - (DT ® I)v{X) - {N^ ® I)v{X) = 
doiX)'^v{X) - Stk(/) = 



s.t. 



T,n(«-l)/2=p 



where Stk(Af) = [>lj , ■ ■ ■ , ij]^ ^ ['iJillJi^flJil- ■ -IvL ■ ■ ■,vj-i,n 

which stacks the nontrivial elements of f/ (/' = 2, . . .,«). Here, we write C 



[ci,...,c„]eM*"", so 

rfo(C) = [ci © [ci,C2] 



ci, 



■,c„ € 



^knxq 



We then consider the Lagrange function of the Optimization Problem 1: 

L{y, <5, v(Z), v(r), Stk(Af)) = m]v{Yfv{Y) + ffl^Stk(Af)^Stk(A?) 

+ / [(/ ® A)v{X) + (/ B)v{Y) - {D^ ® I)v{X) - (A^T /)v(X)] 

+ 5'^\do{xfv{X)-St\i{I) 



where y 



T 

7 1 



T 
72 



T 

7„ 



.R 



[71:72, 



,}'„], and 
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8^ 



■ I <5n 



. 1 2 

n{n+l)/2=q 



— [8ll\82l, 822] ■ ■ ■ \8nl, 8n2, ■ ■ .,<5„„] 



The derivatives of L satisfy 

/i =^= (/®A-DT®/-A?T®/)v(X) + (/®B)v(y) =0, 

/2 =^ = doiX)'^v{X) - Stk(/) - 0, 



/3 



8L 

' dv{X) 



{I(g)A^ -D(g)I-N^I)y + v{XA) = 0, 



/4 = 9^ = (/ ® 5T)y + 2o/,v{Y) = 0, 



/5 



6L 



8Stk(Af) 
where Ay = (5y (( ^j),Aii = 25 a and 

fl'i(C) 



2ffl^Stk(A') -^i(X)^7 = 



0... . 

Cl © [C1,C2] 



•0 

[ci,,c„_i] 



pknxp 



We apply Newton's method to/ = (fj^, . . .j/j^)^ = 0: 



yT,,5',v(X)',v(y)',Stk(7V) 
with the symmetric 



'y\5\v{X)\viY)\StkiNy 



(16.9) 

(16.10) 
(16.11) 



-Jf-'f 
(16.12) 



I (g) A - D'^ (g) I - N'^ (g) I I®B 






doixy +d2{x'^) 

A®/ 



^3(C) 



-^l(x) 









*([72,---, 


yj) 







2ffl2/^ 





d2{C^) 



C-^ ^ *^3 y^ ^3 



G M«^*", and 



M 



c„ © • ■ • © c, 
C2C3 © C3C4 W C4 © C4 • • • c„ © • ■ ■ fcy c„ 




^knxp 
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We can now present the algorithm for the RPAP with real eigenvalues: 

Algorithm 1 

1. Use the Schur Form to find the initial Xq, Yq and No- 

2. Substitute Xq, Yo,No into (16.9), (16.10) and (16.11), we construct: 



I(g)A^ -D(E)I-Nq(E}I do{X)+d2{X'^ 
I(g)B'^ 

di{xf 



70 
<5o 







~2ojjv{Yo) 



2cofStk(A?o) 



(16.13) 



Solve the over-determined system (16.13) in the least squares sense for 

3. Use [yj, (5^, v(Zo) ,v(yo) ,Stk(Afo) ] as the initial point and run Newton's 
iteration as in (16.12) until convergence to X, Y,N. 

4. Substitute the X,N into (16.6) to obtain the feedback matrix F. 



16.3.2 Complex Eigenvalues 

When some of the closed-loop eigenvalues are complex, we assume that the given 
eigenvalues being C = {}.\ ,.. ., /i„-2s\ «! ± P\i, . . ., aj ± /?,/}, where s is the 
number of complex eigenvalues, and A,, a, and fi, are real. 

As in the real eigenvalue case, using a modified real Schur form, the real 
vectors Xj^Xj+i e MP^yj^yj+x £ R™ and J/jif/y+i S K' ' are chosen via 



A[xj,Xj+y] +B[yj,yj+y] - [xj,Xj+i\Dj - X_j[t]j,ry+^] = 0, 



with 



Dj^^i^j.Pj) 



-Pjaj 



^6.14) 



Consequently, we have Mj[xJ ,xT^,yJ ,yTi,riJ ,r]J^i]^ = where 



Mi 



h®A-Dj®l 



h®B 




-h ® X^j 




(16.15) 



With QR to extract the unitary basis of the null space, we have 



^j '-^j+i'^j 'yj+i'^j ''?j+i 



^Ij^^lj^Sy 
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We are then required to choose Uj such that 

'4^2jSyUj 

min = -J -f , 52/ — 



coiI 
C02I 



S3J 



where ojk are some weights in the objective function, to vary the emphasis on the 
condition of the the feedback gain (or the size of F) and closed-loop eigenvalues 
(or the departure from normality). The minimization can then be achieved by the 
GSVD of iS = {Sij, S2j) , by choosing Uj to be the generalized singular vector of S 
corresponding to the smallest generalized singular value. 

In the Real Schur Decomposition in (16.22) with aj,bj £'R,a2j-i + ay — 
2aj,a2j-ia2j - by-iby = of + ffj 



D 



ll © • • • © A„^2s ' 



■ = 


I,.. 


.,s; we have 




bi 


b2 

ai 


©•■■© 


fl2s-l 
b2s-l 


b2 
fl2 



and 



Af= 



12 13--- 


1n-2. 


'7«-2.5+l 


1yi-2s+2 


■■ '7„-l 


tin 


i i 


i 


i 


i 


i 


i 


0'7i,2'7i,3 ■■• 


1\.n-2s 


1\,n-2s+\ 


1\,n-2s+2 ■ 


•■ »?1,«-1 


ni,n 


'72,3 


l2,n-2s 


l2,n-2s+\ 


l2,n-2s+2 ■ 


•■ »?2,«-l 


l2,n 



in—2s—\.n—2s 

" 1n-2s,n-2s+l 1n-2s.n-2s+2 











1n-2,n-l 1n-2.n 





(16.16) 

From the initial point, which is obtained from a modified real Schur form and is 
not usually feasible, we use Newton's algorithm to optimize the problem. We 
arrive at the optimization problem for complex eigenvalues: 



Optimization Problem 2 



minffl^||F||^ + ffl^||7V||^ s.t 



{ax + by~ X{D + n)^o 

\ X^X -7 = 
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where D and A^ are as previously defined, and Z is « x m orthogonal. Optimization 
Problem 2 is equivalent to: 

mmay\v{Yyv{Y) + ffl2Stk(Af)^Stk(Af) 

' {1®A-D'^ ®I-N^ ® l)v{X) + (/ ® B)v{Y) = 

dQ{xfv{X) - Stk(/) = 

fli + (22 ^ 2ai — 



s.t.i 



_«2.s-i +«2s - 2a5 = 0_ 
aia2 - ^1^2 - («! + i^i) = 



.fl2.s-l«2s - hs-ihs - (Kj +/?.,)= 
for which the Lagrangian function equals 

L{y,8,co,^,v{X)AY),a,b,Stk{N))=ayiv{Yfv{Y)+colStk{NfStk{N) 
+ yT[(/®A-Z)T®/-AfT®/)v(X) + (/®B)v(F)] +(5^ <io(Z)^v(Z) - Stk(/) 

.S' 5 

where ro == [muC02, ■ ■ -,(0.,]^ ,^ = [c^i, .^2, • • •, 4]^,« = [ai,fl2, ■ ■ -,02.9]^,^ = [bi, 
b2,...,b2^'' andStkiN)=[nJ,...,tiJ]''. 
The derivatives of L satisfy 

87 



/2 ^ § = ^0(X)^V(X) - Stk(/) =0, /3 ^ ^ = 



fli +02 — 2ai 



fl2s-l +C!2.v ~ 2aj 



8L 

8^^ 



/5 



dL 

MX) 



aifl2 -bib2- («! +jSi) =0 

fl2i-ia2.s - b2s-\b2s -{a] + P])^0 

(I®A^ -D®I-N ®I)y + v(XA) = 0, 



(16.17) 
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/6 = 9^ = (/ ® B')y + 2o,]v{Y) = 0, 
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(16.18) 



dL 
da 



/8 



dL 
db 



OJi 




' i 


1^2 




COl 




iiai 






+ 




- 


CDs 




^sais 




CDs 




.^.?«2j-1. 






iih 








^\bi 




/n-2. 




Lbi, 








_iM 


"-1. 







l'n-2s+I 



In 



v{[Xn-2s+\,---,Xn]) = 0, 

(16.19) 



In 



n,v([x„_2.s+i, •••,■««]) = 0, 



8L 



8Stk(A') 



= 2a)\ Stk(Af) - ^1 (X, .j)^7 = 0, 



(16.20) 

(16.21) 



where 



n,= 


/■ 

/ 


©•■•© 




/ 


f 




, ^i(C,^)ee 




C^ © C/ 


with Cr = Ci®-- 


5 

© [ci,...,c„_2.v-i] and 




Ci = 


Cl,..., 




C„-2s 




Cl,...,C„_ 


-2s _ 


® 


•••® 


Cl,...,C, 




-2 

Cl,. 



}* 



We then obtain the symmetric gradient matrix of/ = [fi^,//, . . -Jt)] '■ 



0000 Q *P -d4{X,s) -~d4{X,s) -di{X,s) 

000 S 

00 dsie) 

^5(0) -dsib) 

A(8)/ -d4{R,s) -df,{R,s) -dg{R,s)'^ 

2m\l 

di{i) 

-^7(0 



2(0?/'- 
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where Q. = I <^A - D'^ <E) I - N^ iSi I,'^ ^ I iSi B,p = {n-2s){n-2s- l)/2- 
2s{n-s- l),e= [1, 1, . . ., 1]^, 5 ^ ^o(X)^ + d2{X^), 

2s 



d4{C,s) 



Cn-2s+2 



Cn-2s+\ 







c„-i 



d4{C,s) 





C„-2.s+l 



C«-2s+2 



Cn-\ 







A;(n — 2s) 

k{2s) 
\k{n - 2s) 

1 

]k{2s) 



dsilj/) = rf5([lAi, . . .,l/'2.J )) = ['A2,'Al] © ['A4,'A3] © • • • © ['A2.,'/'2.,-l], 





d(,(C,s) 





C„_2s+2 

C„-2s+l 





c„ 

c„-i 



A:(;t - 2s) 
k{2s) 



d^{^)^dj{[x^„...,^,f)) = 



^1 

lAi 



^2- 
^2 


©• 


•e 


'0 ^; 



and 



^8(C,.?) 



(C3)^©(c3r 

(C4)^ © (C4)^ © (C4)^ 



(c„-2.,) © ... © (c„_2.,) 

(c„-2i+l)^©...©(c„_2.s+l)^ 

(c„-2.5+2)^ © ... © (c„_2.9+2)^ 

(c„-i)^©...©(c„-i)^ 

(c,,)^ © . . . © (c„)^ 0, 



}n ~2s-l 

}n — 2s 
}n — 2s 
}n-2 
}n-2 



Modifying Algorithm 1, we solve Optimization Problem 2 by Newton's itera- 
tion to obtain a real feedback matrix F: 

Algorithm 2 

1. Use SCHUR to find the initial Xq and Nq. 

2. Substitute Xq, No into (16.17-16.20), we obtain [yq ,3^ ,03^ ,c,q] . 
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3. Use [yJ,(5o,fflJ,(^o,v(Xo)^,v(lo)^,ao,^([,Stk(Afo)^]^ as the initial guess, 
apply Newton's iteration (16.12) until convergence to Z, Y,N. 

4. Substitute the X, A^ and A into (16.6) to obtain the feedback matrix F. 

Remarks At step 3, the initial point []',|, ^q ,ffl,|, ^q , v(Zo) ,v{Y(,) ,aj>^^bj, 
Stk(Afo) ] is sometimes far away from the optimal point. In such an event, we 
apply the GBB Gradient method [10] to decrease the objective function suffi- 
ciently, before Newton's iteration is applied. At step 4, since the matrix X is 
orthogonal, we can use X^ in place of Z"'. 



16.4 Descriptor System 

Multiplying the nonsingular matrix Z^' and orthogonal matrix X on the both sides 
ofA+BF and E + BG, we get 

(A + BF)X = Z{D,_ + N,y), (E + BG)X = Z(£)/, + N^), (16.22) 

where D„_,Dis are diagonal, and NajNfi are straightly upper triangular. 

From the QR decomposition in (16.5), we have QJB ~ and B^ ~ R^^Qj . 
Pre-multiplying the equations in (16.22), respectively, by Qj and B^ , we obtain 

QjAX - QjZD, - QjZN, = 0, QJeX ~ QjZDp - QjZNp ^ 0; (16.23) 

F = R^'Qj[Z{D,+N,)X-' ~A], G = R-^'Qj[Z{Dp + Np)X-' ^ E]. (16.24) 

For a given eigenvalue pairs {Da, £>/;}, we can select Z, X from (16.23) then obtain 
the solution to the pole assignment problem using (16.24). 

Let Y= [yI,YJ]^= [F^,G'^]^X, when X and Y are chosen, the feedback 

matrices can be obtained through H = [F^ ,G'^] = YX^^ . Furthermore, mini- 
mizing the norm of Y is equivalent to minimizing the feedback gain \\H\\. 



16.4.1 Real Eigenvalues 

Let us first consider the case when all the closed-loop eigenvalues are real, with the 
closed-loop system matrix pencil (Ac, £e) = {A + BF,E + BG) = (ZA^Z^ , ZA/jZ^ ) 
in Schur form. Here we have (A^, A^) = (D^ + Nai,Dp + Njj), with D^ — diag 

{«!,..., a,,}, D/i = diag{/?i,...,/?„} being real, A^^ = [^i, jjj, • ■ •, ^ , Np ^ 

Ci,C2j---j Cn] being straightly upper triangular and nilpotent, and ly — 

[rjij, . . .,t]j_ij] , Cj — [Cij, . . .,Cj-ij] are the vectors constructed from fjj and Cj with 
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the zeroes at the bottom deleted (thus f/j, i^i are degenerate and ly, Q e M-' '). The 
Schur vector matrix X is orthogonal. 

Similar to Sect. 16.3.1, we seek X, Yi, Y2, Z, Na and Np such that 

AX + BYi ~ Z{Drj_ + N^) =0,EX + BY2 - Z{Dp + Np) = 0, X^X = I. 
Considering the yth column, we have 



M, 



^Kyl,il.Cl\zl 



= 



(16.25) 



with 



M, 



A 


B 


-z J 





-«/1 


E 


B 





-z J 


-p/ 


'-J 


























^;J 



With QR to extract the unitary basis of the null space, we have 



-IKyl'iJ^^W 



We are then required to choose Uj such that 

'4SjjS2jUj . 

mm = \- -i , 02/ = 

uJ SljSijUj 



^y^^ij^^ij 



ml 

ffl2/ 



Sij 



where ojk are some weights in the objective function. The minimization can then 
be achieved by the GSVD of 5 = (5y, S2j) , by choosing m, to be the generalized 
singular vector of S corresponding to the smallest generalized singular value. 

Remarks In Definition 2.1, Z is only required to be nonsingular, but this will be 
inconvenient to achieve in practice. If it is unconstrained, an ill-conditioned Z may 
cause problems in the Schur-Newton refinement in the next section. Consequently, 
we require in the calculations in (16.25) and (16.15) that Z is unitary with perfect 
condition. 

From the initial point, we use Newton's algorithm to optimize the problem. 



Optimization Problem 3 

r AX + BYi - Z(D^ +Na) = 
mmco^i\\Y\\l + comN^,Nii]\\l s.t. I EX + BY2 ~ Z{Dp + Np) = 

[ X^X -7 = 

where Nc_,Np are n x n strictly upper triangular, X is n x n orthogonal and Z is 
nonsingular. 
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Optimization Problem 3 is equivalent to: 

minffl^v(f )^v(F) + m\ [stk(A?^)^Stk(Af„) + Stk(A^/;)^Stk(A^/j) 

(7® A)v(Z) + {I®B)v{Yy) - {Dl®I)v{Z) - (TVj ®/)v(Z) == 
s.t. \ (/ ® E)v(X) + (7 ® B)v{Y2) - {Dj ® 7)v(Z) - {Nj ® 7)v(Z) = 
do{xyv{X) ~ Stk(7) = 
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where 



Stk(7V,) = ;^T I^T , tjl\ ■ ■ ■ Ij^t , . . ., ^„T_i,„ 



T)n(n-l)/2=/) 



stk(;v„) 



o«(«-l)/2=;) 



We then consider the Lagrangian function of Optimization Problem 3 

L(7,£,,5,v(X),v(yi),v(F2),v(Z),Stk(A^,),Stk(7V/i)) 

= ffljv(y)^v(y) + co^ rstk(Af„)^Stk(A^^) + Stk(A'/j)^Stk(Af/;) 

+ yT [(/ ® A)v{X) + {I® B)v{Yi) - {dJ ® 7)v(Z) - {nJ ® 7) v(Z)] 

+ fiT [(7 £)v(X) + (7 ® 5)v(F2) - (DJ ® 7)v(Z) - (A'J ^ ^)^,(2) 

+ (5TU(^)^v(X)-Stk(7) 

where 



n n 



W ^ ei,e2,...,£„. 



The derivatives of L satisfy 

/i = ^ = (7 A)v(X) + (7 ® 5)v(yi) - (dT ® 7)v(Z) - (^J ® 7)v(Z) = 0, 

0}' 

/2 = g^ = (7 ® £)v(Z) + (7 ® B)v(y2) - (7)J ® 7)v(Z) - (A^J ^ ;)^(2) ^ q, 
/3=^ = ^o(XrvW-Stk(7)=.0, 
/4 = ^7— = (7 ® AT)7 + (7 ® E^)e + v(XA) = 0, (16.26) 



/5 
/6 



6L 

MYx) 

dL 

'MY2) 



{I(g)B'^)y + 2ojlv{Yi)^0, 



(16.27) 
(16.28) 
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fi = g^ = -[Pc ® /) + (A^. ® I)]y - [{Dp ®I) + (Np ® /)]£ = 0, (16.29) 



/9 



8L 



6Stk(Af«) 
8L 



2ffl^Stk(Af«) -^i(Z)^y = 0, 
2ojI Stk(A^/j) - rfi (Z)^£ = 0. 



;i6.30) 



(16.31) 



in 



5Stk{Nij) 

We apply Newton' s method to/ = [f^ Jj , . . .Jj) with respect to the variables 

yT, gT^ ^T^ ^(^)T^ ^^j,^^T^ ^(j,^^T^ ^^^y^ S\k{N,y, Stk(^^)Tl^, where the 



symmetric 



Jf = 



000/(g)A I®B 
001(g) E 

Odo2 

A®/ 

2cojl„,„ 



£>^®/-A^,^®/ -^i(Z) 

-di (Z) 









~'^i,y —dij, 






I®B -Dj; (gl-Nji ® / 






2(o\ Ip 



2(i)\lp 



with cfo2 = doiX) + d2{X^), dxy = c?3([y2> • • •; )'«]) and di^^ = {[si, ■ ■ ■, £«])• 

Applying Newton's method to Optimization Problem 3, we obtain Z,X then by 
using (16.24), the feedback matrices F, G. Now, we can write down the Schur- 
Newton Algorithm for the RPAP_DS with real eigenvalues: 

Algorithm 3 (Real Schur-Newton) 

1. Use SCHUR to find the initial Zo,Zo, Fio, 1^20, A'sO and A^/;o. 

2. Substitute Xo,Zo, Fio, 1^20,^^5,0 and N^q into (16.26-16.29) to construct: 



/(g)A' 


(D3 

diizf 






I(gB^ 

04 



dAzy 



"02 












)'() 
eo 
<5o 





-2o3\v{Y,) 
~2a:,]v{Y2) 


2ffl|Stk(A?J 
2o)lS\k{Np) 



;i6.32) 



with 03 EE -(£)-( ®I + Na®I) and O4 = ~{Dp / + A^/j (g) /) . Solve the 
over-determined system (16.32) in the least squares for 7o,eo and Sq. 

3. Choose {7o,eo,(5o,v(Xo),v(Zo),yio,}'20,Stk(A^^o),Stk(Af/;o)} as the starting 
values, run Newton's iteration for/ until convergence to X,Z,Nc_ and Np. 

4. Substitute the X,Z, Yi,Y2,Nr^ and Nfs into (16.24) to obtain F, G. 
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16.4.2 Complex Eigenvalues 

Let {(ai,^i),...,(a„_2.5,yS„-2.5); (/^i ± /vi, jci ± /ti), . . ., (^, ± iv,, k, ± /t,)} be 
the prescribed eigenvalues, where s is the number of complex eigenvalue pairs. We 
seek feedbaclf matrices F, G £ M'"''" such that a{A+BF,E + BG) = (T{D^,Dp) 
where a^, /J^, /.(,, V/, k,, T/ G K, a^^ + PJ ^ 1 ^ nf + vj + kJ + tJ (j = !,...,«- 2s; 
/ = 1 , . . ., j), D,:, = diagjai , . . ., a„_2s; A'l ± "'i , • • •, Z^, ± »'J andZ)/; ee diag{/?i , . . ., 
i5«-2.; Ki ±/ti,...,k:, ±(T,}. 

Consequently, we are required to choose X, Yi, Y2,Z,Nri and A'^ such that 

AX + BYi - Z{D„_ + Nrj_) = 0, EX + BY2 ~ Z{Dp + Np) = Q, X^X = I. 

Using a modified real Schur form, the real vectors x,-, Xj+i,Zj,Zj+]_,T]:,t]jj^^,Cj 
and Cj_^_i are chosen via 

A[xj,xj+i] +B[Yij,Yij+i] - [zj,Zj+i]D.,j~Z^j[rij,r]j^i] =0, 
£[x,-,x;+i] +B[Y2j,Y2j+i] - [zj,Zj+i]Dpj-Z^j[Cj,i:j+,] =0, 
Xl^.[xy,Xj+i] =0; 

where Dy = (^{pj, Vj) and Dpj = <l)(;c,-, tj) [using $(•, ■) defined in (16.14)]. So 



Mi 



^j ' ^J+ 1 ' ^ U ' y y+ 1 ' y2j ' 3'2;+ 1 ' f/j ' ^j+ 1 ' Cj ' '^j+ U^j ' ^j+ 1 



= 0, 



where 



M, 



' h®A 


h®E 


h <E> Xlj 






h'S'B 

h^B 






)Z_; 











-h (E) Z J 









/2 ® Zl, 



With QR to extract the unitary basis of the null space, we have 
We are then required to choose m, such that 






M. 5'2-52jMj J, 

mm = -V..^^ — , S2j == 



"j •^i/'^U"/ 



fflj/ 

€02l 



Sy 



where m^ are some weights in the objective function. The minimization can be 
achieved by the GSVD of 5 = {Sij,S2j) , by choosing uj to be the generalized 
singular vector of S corresponding to the smallest generalized singular value. 
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In the Real Schur Decomposition (16.22), with aj,bj,Cj,dj £ R, fl2;-i + «2/ = 

2t.lj, ay-iay - by-lby = fij + VJ; Cy-l + Cy = 2Kj, Cy^iCy - dy^idy = kJ + tJ 

for j ~ I, . . .,s; we have 



Da = 



Dh 



«! 



' an-2.s 



Pi © ' ' ■ © Pn-2s © 



fll 


bi 










® • 


•© 


.^1 


02 _ 






C\ 


di' 










©• 


•© 


A 


Cl_ 







ais-\ 


hi,' 




his-\ 


ais _ 




Cls-\ 


dis' 




dis-\ 


Cls _ 





and 



The upper triangular A^^ is identical to N in (16.16) and Njj shares the same 
structure with :^ replacing all the f/'s. 

From the initial point, which are calculated from a modified real Schur form 
and is usually infeasible, we can use Newton's algorithm to optimize the problem. 
We arrive at the optimization problem for complex eigenvalues: 

Optimization Problem 4 

( AX + BYi - Z{D^ +N^)=Q 
mmo?,\\Y\\l + ojl\\[Na,Np]\\l s.i.l EX + BY2 ~ ZiDp + Np) ^ 

[ X^X -7 = 

where Da,DiJ,Ny and Np are as defined before, X is n x n orthogonal and Z is 
n X n nonsingular. 

Optimization Problem 4 is equivalent to: 



mm 



s.t.< 



mlv{Yyv{Y)+col[Stk{NyStk{Ny)+Stk{NiiyStk{Nij)] 

(/®A)v(X) + (/®B)v(yi) - (Dj ®/)v(Z) - {Nj ®I)v{Z) = 
{I®E)v{X) + {mB)v{Y2) - {Dj ®I)v{Z) ~ {Nj ®/)v(Z) =0 

<io(X)^v(X)-Stk(/)=0 



fli +fl2 — 2/ij =0 



2 + vf 



flifll - ^I ^2 - (^if + vf ) = 



ais-iais - b2.,-ib2s - (fi, + vj) = 



c\ +C2~2k\ ^0 



.C2J-1 +C2s-2k'j = 0. 



C1C2 - fi;i^2 - (ki + Ti ) = 



C2.?- 1 C2.5 - fi?2.v- 1 1^2.? - (k^ + T^ ) = 
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for which the Lagrangian function equals 

L{y,£,5,oj,0,^,a,v{X),v{Yi)AY2)AZ),a,b,c,d,Stk{N„),Stk{Np)) 
= ffl?v(y)^v(y) + co^[Stk(A?„)^Stk(AfJ +Stk(Af/j)^Stk(A^/j)] 
+ 7^[(/®A)v(X) + (/®B)v(yi) - (dJ 0/)v(Z) - {nJ ®/)v(Z)] 
+ eT[(/®£)v(X) + {I(g)B)v{Y2) - {DJ ®I)v{Z) - (A^^ ^/)^,(z)] 

s s 

+ ^T[do(Z)^v(X) -Stk(/)] +^fflj(a2j-i +ay-2Hj)+^0j{c2j^i+C2j-2Kj) 



j=i 



7=1 



7=1 7=1 

where = [Oi, 02, • • ■, 0,]^, ff = [(7i,(72, • • ■, cr.s]^,c = [ci,C2, . . .,C2,]^ and d = 

[di,d2.,. ■ ■,d2s] ■ 

The derivatives of L are 



/2 



8L 

8^^ 



(/ ® E)v{X) + (/ B)v{Y2) " (Z)^ ® /)v(Z) - {Nj ® I)v{Z) = 0, 
/3=^ = ^o(X)\(X)-Stk(/)=0, 



/4 



8L 

Set) 



oi + a2 — 2fii 

«2j-1 +fl2j -2/i 

8L 

^8^ 



8L 
80 



ci + C2 - 2ki 

C2s-l +C2.V - 2Kj 



aia2 — ^71^2 ^ (/^i + Vj) = 



/8 



8L_ 
8^^ 

8L 

dv(X) 



CiC2 - <ii(i2 - {k\ + Tj) = 



C2.S-1C2.V - d2,-ld2s - (k^ + t^) = 

= (7 ® AT)y + (/ ® £^)e + v{XA) ^ 0, 



/9 



8L 

MYi) 



= {I(^B'^)y + 2o}lv{Yi)^0, 



(16.33) 
(16.34) 
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/lo = g^ - (/ ® B^)e + 2ffl2v(F2) = 0, 



(16.35) 



8L 



/ii =g— =-[(£>,«)/) + (^,®/)]y- [(D,5®/) + (^/j®/)]g-0, (16.36) 



/i 



12 



8L 
8a 



"roi" 




^lfl2 




ftji 




im 






+ 




- 


ffl. 




l,a2. 




_ffl,_ 




_i,a2,-i_ 





)'n-2.s+l 



yn 



v{[Zn-2,+ \,--;Zn]) =0, 

(16.37) 



/i- 



8L 

8Z^ 



l,b2. 



)'n-2s+l 



y„ 



/l^ 



8L 
dc 



Oi 




aiC2 

(JlCl 






+ 




- 


Os 




(!sC2s 




[o.\ 




_OsC2s-\_ 





'n-2.s+l 



8L 
8d 



0-1(^2 

a\di 

Osd2s 
<^sd2s-\ 



''n-2s+l 



n, V([Z„_2.5+1,---,Z«]) = 0, 

(16.38) 

v([z„-2.v+l,---,Z«]) =0, 

(16.39) 

n,v{[Zn-2,+ U--;Zn]) = 0, 



/l6 
/l7 



8L 



8Stk(^,) 

8L 

8Stk(A?/j) 



2ml Stk(A^„) - <ii (Z, s)'^y = 0, 



(16.40) 
(16.41) 



^2a)lStk{Nij)-di{Z,sys^0. (16.42) 
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We can obtain the symmetric gradient matrix Jj 



h = 



/2 



where 



Qi ^I' d"! -^4(2,5) -d4{Z,s) 



Q2 ^ O2 
S 



.0000 






ds{e) 











-ds{b) 







-d4{Z,s) 




ds{e) 


dsic) 





-d4{Z,s) 








-dsid) 



'di {Z,s) 












-di{Z,s) 








J 4 



h = 



A(g)/ 

2m\l 
2ay\l 



-d4{R,s) -de{R,s) -d^iW.s) -df,(W,s) -di{R,s)^ 
dj{^) 

-d^{i) 

£/7(ff) 

-dj{ff) 











with Qi 



)A,Q2 



'£,<fi 



-£>' 



)I ~ NJ ®I and $2 



-£), 



A'J ®I. Similar to Algorithm 3, we solve Optimization Problem 4 by Newton's 
iteration for real f , G: 

Algorithm 4 {Complex Schur-Newton) 

1. Use SCHUR to find the initial Zq, Ym, Y2o,Zo,Nm,Nijq. 

2. Substitute X(),Yio,Y2o,Zo,Na,o,NiJo into (16.33-16.42), we obtain {yo,£o,So, 
fflo, do, ^o,(To,ao,bo,co,do}. 

3. With{7o, fio, So, fflo, ^o, i^o, o-Q, ao, bo, cq, do, v(Xo), v(yio), v(F2o), v(Zo), Stk;(A^„o), 
Stk(Af/jo)} as starting values, run Newton's iteration until convergence to 
X,Yi,Y2,Z and N,,Nii. 

4. Substitute X, Yi,Y2,Z and Na,Np into (16.24) to obtain F, G. 

Remarks 

• At step 2, we set a, b, c, d being the same as the given eigenvalues, and obtain 70 
by substituting Zo,Nao into (16.41) and Sq by substituting Zo,Nfjo into (16.42). 
Then from (16.33)-(16.40) we obtain ^0, ojq, Oq, ^^ and ctq. 

• The starting point {yo,so,3o,coQ,9o,^o,(To,ao,bo,co,do,v{Xo),v{Yio),v{Y2o), 
v(Zo), Stk(A(xo), Stk(Af/5o)} is often far away from being optimal. In such an 
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event, we apply the GBB Gradient method [10] to decrease the objective 
function sufficiently, before Newton's iteration is applied. 
• At step 4, since the matrix X is orthogonal, we can use X^ in place of X"'. 



16.5 Numerical Examples 



The examples are quoted from [7], some modified with the addition of a singular 
E. The first two examples are for ordinary systems, with the second one (Ex 2) 
containing a complex conjugate pair of closed-loop poles. The last two are for 
descriptor systems with Ex4 containing a complex conjugate pair of poles. The 
computations were performed using MATLAB [14] on a PC with accuracy eps 
« 2.2(-16) (denoting 2.2 x 10""'). Various weights (coi,oj2) = (0, 1), (1,0) and 
(1,1) have been tried to illustrate the feasibility of the algorithms. To save space, 
only the solution matrices corresponding to the last sets of weights (coi , t02), with 
both weights being distinct and nonzero, are included here. For each set of 
weights, ObjgQijup^ and Objj,g^j^gjj, the values of the objective function from SCHUR 
at the starting point and after Newton refinement respectively, are presented. 



e I: n^ 


= 4,m 


= 2,1 = {-2, - 


-3,-4, 


-!};£ = 


/4, 
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65 -19.5 


19.5' 




"65 
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0.1 
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-0.1 
-0.5 



-1 


, B = 










•) 







0.4 










0.4 _ 




ffli = ( 


3, tt)2 = 


1> Ob j SCHUR = 


= 35.41 


. Otij„g„^„^ = 


= 20.67 


ffli = 


.,(02 — 


0, Dbj SCHUR = 


= 13.02 


. Otij„g„^„^ = 


= 6.049 


ffli = 


.,0^2 — 


1> Ob j SCHUR = 


= 73.35 


, DbJNe. 


ton 


= 32.1 


6 



Example 2: n = 4, m = 2, a = {-29.4986, -10.0922, 2.5201 ± 6.8910/}; £ = 



A, 



5.8765 
6.6526 
0.0000 
0.0000 



9.3456 
0.5867 
9.6738 
0.0000 



4.5634 
3.5829 
7.4876 
6.6784 



9.3520 
0.6534 
4.7654 
2.5678 



B 



3.9878 






0.5432 
2.7650 







ffll = 0, (02 = 1, 
ffll = 1, («2 = 0, 
ffll = \,(02 — 1, 



Ob j SCHUR =13.07, 
ObJscHUR = 15-79, 
Ob j SCHUR = 20.65, 



ObJHewton = 49.04; 
Dbj„e„ton = 14.71; 
DbJHewton = 58.77. 



Example 3: n — 3,m — 2, a^ — {1,1, 1}, Ap — { — 1, —2, —3}; 
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ffli =0,OJ2= 1, DbJscHUR = 186.9, Dbj„e„ton = 6-380; 

ffli = 1,0)2 = 0, DbJscHUR = 1-825, Dbj^^^^^^ = 0.833; 
Oil = 1, 0)2 = 1, ObJscHUR = 23.36, Dbj„3„t„„ = 8.767. 

Example 4: n = 5 , m = 2, a^, = {1, 1, 1, 1, 1}, Ap = {—0.2, —0.5, —1, —1 ± /}; 

-0.1094 0.0628 

1.306 -2.132 0.9807 

1.595 -3.149 1.547 

0.0355 2.632 -4.257 1.855 







0.0023 



0.1636 -0.1625 



E = 



10 

10 

1 

10 1 

10 





0.0638 

B = 0.0838 

0.1004 

0.0063 









0.1396 


-0.206 


0.0128 



ffli = 0,0)2 = 1, 
0)1 = l,0J2 = 0, 
0)1 = l,0J2 = 1, 



Dbjc 



= 4.609, Dbj, 



= 4.012; 



DbJscHUR = 8968, Obj,,„,„, = 66.84; 



Dbjc 



= 1253, Obj, 



= 78.46. 



Comments 



1. For Examples 2 and 3 with real eigenvalues, the starting vectors from SCHUR 
fall within the domain of convergence for the Real SCHUR-NEWTON algo- 
rithm. This coincides with our experience with other RPAP and RPAP_DS with 
real eigenvalues. Newton refinement produces a local minimum which 
improves the robustness measure substantially. 

2. For the RPAP_DS with complex eigenvalues like Example 4, the starting 
vectors from SCHUR are usually infeasible. Preliminary correction by Newton's 
iteration can be applied to the constraints in Optimization Problem 4, with 
gradient J2- This produces a feasible starting vector for the Complex SCHUR- 
NEWTON algorithm. However, the improvement in the robustness measure is 
usually limited, as shown in Example 6. Apart from having an infeasible 
starting vector far from a local minimum, the main difficulty lies in the choice 
of finding accurate starting values for the Lagrange multipliers. However, 
improvements are still possible theoretically and achieved in practice. 

3. Similarly for the RPAP with complex eigenvalues like Example 2, the starting 
vectors from SCHUR are often infeasible. The feasible Newton refined solutions 
may then have objective function values greater than those for the infeasible 
starting points from SCHUR. 

4. The Newton method has been applied in his paper to optimize the robustness 
measures under constraints. Other methods, such as the augmented Lagrange 
method, may also be applied. Alternatives will be investigated elsewhere. 
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5. The convergence of the Newton refinement is expected to be fast, assuming that 
the associated Jacobian is well behaved or when the starting vector is near the 
solution. Otherwise, the convergence will be slower. When the starting vector is 
poor, the GBB Gradient method has been applied. 



16.6 Epilogue 

The algorithm SCHUR [7] for state-feedback pole assignment, based on the Schur 
form, has been improved and extended for descriptor systems, minimizing a 
weighted sum of the departure from normality and feedback gain. Similar to the 
original SCHUR algorithm, the method appears to be efficient and numerically 
robust, controlling the conditioning of the closed-loop eigensystem and the feed- 
back gain. The method can be generalized further for second-order and periodic 
systems, as well as systems with output feedback. 

Acknowledgements We would like to thank Professor Shu-Fang Xu (Beijing University, China) 
for various interesting discussions and much encouragement. 
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Chapter 17 

Synthesis of Fixed Structure Controllers 

for Discrete Time Systems 

Waqar A. Malik, Swaroop Darbha and S. P. Bhattacharyya 



Abstract In this paper, we develop a linear programming approach to the syn- 
thesis of stabilizing fixed structure controllers for a class of linear time invariant 
discrete-time systems. The stabilization of this class of systems requires the 
determination of a real controller parameter vector (or simply, a controller), K, so 
that a family of real polynomials, affine in the parameters of the controllers, is 
Schur. An attractive feature of the paper is the systematic approximation of the set 
of all such stabilizing controllers, K. This approximation is accomplished through 
the exploitation of the interlacing property of Schur polynomials and a systematic 
construction of sets of linear inequalities in K. The union of the feasible sets of 
linear inequalities provides an approximation of the set of all controllers, K, which 
render P{z, K) Schur. Illustrative examples are provided to show the applicability 
of the proposed methodology. We also show a related result, namely, that the set of 
rational proper stabilizing controllers for single-input single-output linear time 
invariant discrete-time plants will form a bounded set in the controller parameter 
space (/ and only if the order of the stabilizing cannot be reduced any further. 
Moreover, if the order of the controller is increased, the set of higher order con- 
trollers will necessarily be unbounded. 
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17.1 Introduction 

There is renewed interest in the synthesis of fixed-order stabilization of a linear 
time invariant dynamical system. Surveys by Syrmos et al. [18], show that this 
problem has attracted significant attention over the last four decades. Application 
of fixed-order stabilization problem can be found in the work of Buckley [7], 
Zhu et al. [19], and Bengtsson and Lindahl [2]. This problem may be simply stated 
as follows: Given a finite-dimensional LTI dynamical system, is there a stabilizing 
proper, rational controller of a given order (a causal controller of a given state- 
space dimension)? The set of all the stabilizing controllers of fixed order is the 
basic set in which all design must be carried out. 

Given the widespread use of fixed-order controllers in various applications 
(see Ref. [10], Chap. 6), it is important to understand whether fixed-order 
controllers that achieve a specified performance exist and if so, how one can 
compute the set of all such stabilizing controllers that achieve a specified per- 
formance. Unfortunately, the standard optimal design techniques result in con- 
trollers of higher order, and provide no control over the order or the structure of the 
controller. Moreover, the set of all fixed order/structure stabilizing controllers 
maybe non-convex and in general, disconnected in the space of controller 
parameters, see Ref. [1]. This is a major source of difficulty in its computation. 

A good survey of the attempts to solve the fixed order control problem and the 
related static output feedback (SOF) problem is given in Syrmos et al. [18], 
Blondel et al. [5], and Bernstein [3]. Henrion et al. [11] combine ideas from strict 
positive realness (SPRness), positive polynomials written as sum of squares (SOS) 
and LMIs to solve the problem of robust stabilization with fixed order controllers. 
The LMI approach for synthesizing a static output feedback (SOF) controller is 
also explored in Ghaoui et al. [9], and Iwasaki and Skelton [12]. 

Datta et al. [8] used the Hermite-Biehler theorem for obtaining the set of all 
stabilizing PID controllers for SISO plants. Discrete-time PID controllers have 
been designed by Keel et al. [13] using Chebyshev representation and the inter- 
lacing property of Schur polynomial. They use root counting formulas and carry 
out search for the separating frequencies by exploiting the structure of the PID 
control problem. The interlacing property of real and complex Hurwitz polyno- 
mials was used by Malik et al. [14, 16] to construct the set of stabilizing fixed 
order controllers that achieve certain specified criterion. 

In this paper, we focus on the problem of determining the set of all real 
controller parameters, K = {ki,k2, . . .jki), which render a real polynomial Schur, 
where each member of the set is of the form: 

1=1 

The paper is organized as follows: In Sect. 17.2, we describe the Chebyshev rep- 
resentation of polynomials and we provide the characterization for a polynomial. 
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P{z) , to be Schur in terms of its Chebyshev representation. Section 17.3, deals with 
the generation of outer approximation So and inner approximation S, of the set of 
controllers 5, of a given structure, that stabilize a given linear time invariant 
discrete-time system. It is seen that Si C S C So- Illustrative examples provided 
here show how the inner and outer approximations of the set of fixed structure 
stabilizing controllers may be constructed. Section 17.4 describes the boundedness 
property of the set of stabilizing controllers. In Sect. 17.5, we provide concluding 
remarks. 



17.2 Chebyshev Representation and Condition 
for a Polynomial to be Schur 

Let P{z) = a„z" + a„-iz"^^ + • • • + oiz + flo denote a real polynomial, that is the 
coefficients, a, are real numbers. We are interested in determining the root dis- 
tribution of P{z) with respect to the unit circle. The root distribution of P{z) is 
necessary in characterizing the set of stabilizing controllers for a discrete-time 
control system. In such systems, P{z) could denote the characteristic polynomial of 
the given discrete-time control system. Stability would require that all roots of 
P{z) lie in the interior of the unit circle, i.e. P{z) must be Schur. 



17.2.1 Chebyshev Representation of Polynomials 

We need to determine the image of the boundary of the unit circle under the action 
of the real polynomial P{z)- 

{p{z) ■.z = e''^,0<0<2n}. 

As the coefficients, a,-, of the polynomial P{z) are real, P{e'^) and P(e"''^) are 
conjugate complex numbers. Hence, it is sufficient to determine the image of the 
upper half of the unit circle: 

e 



{p{z) ■.z^e>\Q<0<n] 



By using, z!'\^_jo— co&kO + jsinkO, we have 



P(e' ) — {a„cosnO + • • • + oi cos + oq) +j{a„ sin nO + ■ ■ ■ + ai sin 6). 

cos kO and sin kO/ sin can be written as polynomials in cos using Chebyshev 
polynomials. Using u — — cos 0, if e [0, n] then, m e [—1,1]. Now, 

e'^ = cos +jsinO — ~u +7V1 — u^. 
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Let cos kO — Ck{u) and smkO/ sinO = Sk{u), where q(m) and Sk{u) are real poly- 
nomials in u and are known as the Chebyshev polynomials of the first and second 
kind, respectively. It is easy to show that, 

s,{u) = -Y^, k^l,2,... (17.1) 

k du 

and that the Chebyshev polynomials satisfy the recursive relation, 

Q+i(m) = -mq(m) - (1 - m^)^*(m), A:=1,2, ... (17-2) 

Using (17.1) and (17.2), we can determine q(m) and Sk{u) for all k. 
From the above development, we see that 



^(^'')L=cos-.(-«)= ^(") +yVl - u^T{u) =: P,{u). 

We refer to Pc{u) as the Chebyshev representation of P(z). R{u) and T{u) are real 
polynomials of degree n and n — \ respectively, with leading coefficients of 
opposite sign and equal magnitude. More explicitly, 

Riu) — a„c„{u) + ■ ■ ■ + aiCi{u) + oq, 

T{u) = a„s„{u) + fl„_ii'„_i(M) + • • ■ + aiSi{u). 

The complex plane image of P{z) as z traverses the upper half of the unit circle can 
be obtained by evaluating Pc{u) as u runs from —1 to +1. 



17.2.2 Root Distribution 



Let (/)p(0) := arg[P(e' )] denote the phase of P{z) evaluated at z — e' and let 
h.g[(f>p{0)\ denote the net change in phase of P{e'^) as 6 increases from Oi to O2. 
Similarly, let (f>p,{0) :— arg[Pc{u)] denote the phase of Pc{u) and Ay-[0p (m)] 
denote the net change in phase of Pc{u) as u increases for ui to M2- 

Lemma 1 Let the real polynomial P{z) have i roots in the interior of the unit 
circle, and no roots on the unit circle. Then 

^l[MO)] = ni = ^t\[<i>pM■ 

Proof From geometric considerations it is easily seen that each interior root 
contributes 2?! to \^^[(f>p{0)] and therefore because of symmetry of roots about the 
real axis the interior roots contribute in to Aq[(J)p{0)]. The second equality follows 
from the Chebyshev representation described above. D 
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17.2.3 Characterization of a Schur Polynomial in Terms of Its 
Chebyshev Representation 

Let P{z) be a real polynomial of degree «. This polynomial will be said to be Schur 
if all n roots lie within the unit circle. In this section, we characterize the Schur 
property of a polynomial in terms of its Chebyshev representation, P{e'^) = R{0)+ 
jf{0) = R{u) +jVl — u^T{u), where u — —co&O. 

Theorem 1 P(z) is Schur if and only if, 

1. R(u) has n real distinct zeros r;, j = 1, 2, . . .,« in (—1, +1), 

2. T{u) has n — \ real distinct zeros tj, j — 1 , 2, . . . , « — 1 in (— 1 , + 1 ) , 

3. the zeros r,- and tj interlace, i.e 

-I<ri<ti<r2<t2< ■■■ <tn-\<rn< + 1. 

Proof Let 

tj = — cos aj, Xj G {0,n), j = 1, 2, . . ., m — 1 



or 



and let 



a, = cos '(-f,), ./■= 1,2, ...,M- 1, 

ao = 0, a„ = 7t 



/?, = cos ^(-r,), /=1,2,...,M, /?,e(0,7r). 

Then (ao, ai , . . ., a„) are the n+ I zeros of T{0) and (/?j , /?2, . . ., [!„) are the n zeros 
of R{0), the third condition means that a, and /? satisfy 

= ao</?i <«! < • • • <P„_i <oc„ = n. 

This condition means that the plot of P{e'^) for G [0, n] turns counter-clockwise 
through exactly 2« quadrants. Therefore, 

A^,[M0)]^2n'^^nn. 

and this condition is equivalent to P{z) having n zeros inside the unit circle. D 



17.3 Synthesis of a Set of Stabilizing Controllers 

In this section, we seek to exploit the Interlacing Property (IP) of Schur polyno- 
mials to systematically generate inner and outer approximation of the set of sta- 
bilizing controllers, S. This approach leads to sets of linear programs (LPs). 
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Let P{z, K) be a real closed loop characteristic polynomial whose coefficients are 
affinely dependent on the design parameters K; one can define the Chebyshev rep- 
resentation through P{e'^,K) = R{u,K) +yVl — u^T{u,K), where u — —co&d. 
R{u, K) and T{u, K) are real polynomials of degree n and n — 1, respectively and are 
affine in the controller parameter K. The leading coefficients of R{u, K) and T{u, K) 
are of opposite sign and are of equal magnitude. 



17.3.1 Inner Approximation 



The stabilizing set of controllers, S is the set of all controllers, K, that 
simultaneously satisfy the conditions of Theorem 1 . The problem of rendering 
P{z, K) Schur can be posed as a search for 2w — 2 values of u. By way of 
notation, we represent the polynomials R{u,K) and T{u,K) compactly in the 
following form: 



R{u,K) = [1 



T{u,K)^ [1 



u"]Ar 



„«-i ■ 



(17.3) 



(17.4) 



In (17.3) and (17.4), A^ and Aj- are real constant matrices that depend on the plant 
data and the structure of the controller sought; they are respectively of dimensions 
(«+ 1) X (/+ 1) and (n) x (/+ 1), where, n is the degree of the characteristic 
polynomial and / is the size of the controller parameter vector. For i — 1,2,3,4, let 
Ci and Si be diagonal matrices of size 2n; for an integer m, the (m + l)st diagonal 

entry of C, is cos ( '^ ''^ + ^ 1 and the corresponding entry for Si is 
sin( ^ '^ '^ + T^) ■ PoJ" ^'^y given set of 2« — 2 distinct values of u, 

-1 = Mo<Mi < • • • <U2„-2<U2n-l = 1, 

and for any integer m define a Vandermonde-like matrix, 



V{uo,ui,...,U2„-i,m) 



Mo 
Ml 
M2 



1 M2«-I 



^2«-l 



We are now ready to characterize the set of stabilizing controllers K. 

Theorem 2 There exists a real control parameter vector K — (ki,k2, . . .jki) so 
that the real polynomial P{z, K) 
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P(z, K) := Poiz) + kiPi (z) + • • • + kiPiiz) 

= p„{K)z" +p„-i{K)f-' + ■ ■ ■+pq{K) 

is Schur if and only if there exists a set of 2n — 2 values, —l~uo<ui< 
U2< ■ ■ ■ <M2n-2<M2n-i = 1^ ^o that One of the following two linear programs 
(LPs) is feasible: 
LPs: 

CkV{uQ,ui, . . .,U2„-un)AR 



SkV{uQ,uu...,U2n-i,n- I) At 



>0, /or A: =1,3. 



Proof The three conditions of Theorem 1 is equivalent to the existence of 2« — 2 
values of u,—l<ui<U2< ■ ■ ■ <M2n-2 < 1 such that the roots of the Chebyshev 
polynomial R{u, K) lie in 

(-1,Mi),(m2,M3),(m4,M5),--- 

while the roots of the other Chebyshev polynomial T{u, K) lie in 

(mi,M2),(m3,M4),(m5,M6),--- 

If /?(— IjA") > 0, r(— 1,^) > 0, then the placement of roots will require 

R{uuK)<0,R{u2,K)<0,R{u3,K) > 0, . . . 
and 

T{ui,K) > 0, r(M2,^) <o, r(M3,^) <o, . . . 

In other words, the signs of R{ui, K) and T{ui, K) are the same as that of cos(| + (|) 
and sin(| + i|) respectively. This corresponds to the LP for A: = 1 . Similarly for 
/?( - 1 , ^) < and r( ~ 1 , ^) < we have the LP corresponding to A: = 3 . D 

The essential idea is that the plot of the polynomial P{^^) must go through 2m 
quadrants in the counterclockwise direction as increases from to n. The con- 
ditions given above correspond to the plot starting in the feth quadrant at = 0+. 

The procedure to find the inner approximation is to partition the interval (—1,1) 
using more than (2m — 2) points (either uniformly or by using appropriate 
Chebyshev polynomial) and systematically searching for the feasibility of the 
obtained set of linear inequalities. Every feasible LP, yields a controller K which 
makes the polynomial P(z, K) Schur. The union of all the feasible sets of the LPs 
described above, for all possible sets of (2m — 2) points in (—1,1) is the set of all 
stabilizing controllers. With partitioning (—1,1), however, one will be able to 
capture only finitely many of the possible sets of (2« — 2) points, mi, . . ., M2n-2- 
The feasible sets of the LPs corresponding to these finitely many possible sets will 
provide an inner approximation of the set of all stabilizing controllers. This 
approximation can be made more accurate by refining the partition — i.e., if A" is a 
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Stabilizing controller not in the approximate set, then there is refinement [which 
will separate the roots of R{u, K) and T{u, K)] of the partition from which one can 
pick 2n — 2 points so that one of the two LPs corresponding to these points is 
feasible. This is the basic procedure for finding the inner approximation. 



17.3.2 Outer Approximation 

In the previous section, we outlined a procedure to construct LPs whose feasible 
set is contained in S. Their union S, is an inner approximation to S. For com- 
putation, it is useful to develop an outer approximation, S„ that contains S. In this 
section, we present a procedure to construct an arbitrarily tight outer approxi- 
mation So as a union of the feasible sets of LPs. We propose to use the Poincare's 
generalization of Descartes' rule of signs. 

Theorem 3 (Poincare's Generalization) Let P{x) be a polynomial with real 
coefficients. The number of sign changes in the coefficients of Qk{x) := (x + 1) 
P(x) is a non-increasing function of k; for a sufficiently large k, the number of sign 
changes in the coefficients equals the number of real, positive roots of P{x) . 

The proof of the generalization due to Poincare is given in Polya and Szego [17]. 

For the discussion on outer approximation, we will treat the polynomials, 
R{?.,K) and T{1,K), as polynomials in 1 obtained through the bijective mapping 
A — 1^^. This maps the interval ( — 1, +1) into the interval (0, oo). This mapping is 
applied in the following way: 

<' + '>"e(^)=ew- 

The ;th roots of .^(1, K) and T{X, K) be represented as Xr.t and I,,- respectively. Since 
the polynomials R and T must have respectively n and « — 1 real, positive roots, an 
application of Poincare's result to the polynomials R and T yields the following: 

Lemma 2 IfK is a stabilizing control vector, then (/. + 1 ) /?(/'., K) and (/. + 1 ) 
T(2.,K) have exactly n and n— \ sign changes in their coefficients respectively for 
every ^ > 1. 

The procedure in [4] corresponds to A: = 1 of the above lemma. 
The following lemma takes care of the interlacing of the roots of two 
polynomials: 

Lemma 3 Let K render a polynomial P{z, K) Schur. Then the polynomial, 
has exactly n real positive roots for all rj Cz TZ. 
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Fig. 17.1 Graph of the rational function y := 



■ir(;.) 



Proof The roots of r(/!,, A") and ^(1, K) are real and positive and they interlace (/ 
and only if Q{a, K, r]) has exactly n real positive roots for all ;; e 7^. To prove 
sufficiency, we consider the graph of the rational function y := ^ }~ ^^id con- 

sider the intersections withy = rj (see Fig. 17.1). To prove necessity, we argue, via 
a root locus argument, that if the interlacing of real roots condition is violated, then 
for some value of (7 e 5R, polynomial Q{1, K, rj) will have at least a pair of 
complex conjugate roots. D 

Lemmas 2 and 3 can be put together to show that an arbitrarily tight outer 
approximation can be constructed. 



Example 1 Consider the plant 



G{z) 



2z+l 



1.9z2 + 2.r 
It is desired to calculate the complete set of first order controllers of the form 

h{z-\) 



C{z) 



k2 



The characteristic equation is given by 

(1.9 + ki)z^ + (1.9^2 - 3ki)z^ + (2.1 + 3^1 )z + (2.U2 - ^1). 
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The Chebyshev polynomials are found to be: 

R{u,K) = ~{1.6 + 4ki)iP + {3M2-6ki)u^ + 3.6u + 2ki 
T{u,K) = {1 .6 + 4ki)u^ + {6ki -3M2)u+{2ki +0.2). 

These can be written in the compact form of (3) and (4) as: 



0.2k2 



R{u,K) — [l u u^ w' 





3.6 




-7.6 



2 0.2 



-6 3.8 

-4 



T{u,K) = [l u u^] 



0.2 2 

6 

7.6 4 



The above compact form allows the LPs to be formulated quite easily and the 
interval [—1, 1] is partitioned and a systematic search for 2« — 1 points in the 
interval is carried out. 

Figure 17.2 displays the inner and outer approximation of the set of stabilizing 
controllers. The difference between outer approximation and inner approximation 
is the black colored region. The inner approximation is an excellent approximation 
of the complete set of stabilizing controllers. 




-0.5 



-1 -0.5 0.5 1 1.5 2 2.5 



Fig. 17.2 Set of stabilizing controllers — an inner and outer approximation 
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The controller is considered to be of the following PID structure: 

kiz^ + k2Z + ki 

Ciz) = 3 — ; ■ 

z^ — z 

The characteristic polynomial is 

z^-z^ + {ki - 0.25)z^ + {k2 + 0.25)z + h. 
The Chebyshev polynomials are: 

R{u, K) = %u* + Aw' + (2/^3 - 8.5)m2 - {k2 + 3.25)u -ki+l.25 + h 
T{m, K) = -8m^ - 4u^ - {2ki - 4.5)m 4-/^2+ 1-25. 

In compact form, the Chebyshev polynomials can be represented as: 

-r 



R{u,K)^[\ u u^ u^ u^] 



1.25 1 

-3.25 0-10 

-8.5 2 

4 

8 



1 

h 

ki 



T{u,K)=[\ u m2 „3] 



1.25 1 
4.5 0-2 
-4000 
-8000 



1 

ki 
ki 
ki. 



An inner approximation of the set of controllers is shown in Fig. 17.3. An inner 
and outer approximation of the set of controllers are shown in Fig. 17.4. The inner 
approximation obviously lies inside the outer approximation, which is depicted 
using the lighter color. 



17.4 On the Boundedness of the Set of StabiUzing 
Controllers 

It is a known fact that an nth order plant can be stabilized by a (n — l)th order 
controller [6]. The poles of the closed loop system can be freely assigned with such 
a controller which can be obtained by using the inversion of the eliminant matrix 



378 



W. A. Malik et al. 



2-5-,- 

2- 
1.5 -. 

1- -■ 
0.5- 

-0.5 





1 4.5 
Fig. 17.3 Solution for Example 2: an inner approximation 



^2- 




kj -5 3 



Fig. 17.4 Set of stabilizing controllers — an inner and outer approximation 



of the plant. However for a given controller order r it is not clear if stabilization is 
possible. A related question is the minimal order of a stabilizing controller for a 
plant. In this section, we provide the first such characterization for discrete-time 
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LTI system through the boundedness of the set of controllers of a given order. In 
the previous section, we provided a systematic approach for synthesizing the inner 
and outer approximations of the set of stabilizing controllers and such approxi- 
mations can be brought to bear in the verification of the minimal order of sta- 
bilization. We show that the set of rational proper stabilizing controllers for single- 
input single-output linear time-invariant discrete-time plants will form a bounded 
set in the controller parameter space if and only if the order of the stabilizing 
cannot be reduced any further. Moreover, if the order of the controller is increased, 
the set of higher order controllers will necessarily be unbounded. The continuous- 
time counterpart of these results are presented in Malik et al. [15]. 

The following lemmas provide the key basis for the proposed characterization 
of stabilizing controllers. 

Lemma 4 If Cr(z) = j^^ is a rth order rational, proper controller that stabilizes 
P{z) = -jjT), then given any polynomials Nr{z) and Dr{z) of degree r, there is a 
e* > such that the {r -\- \)th order proper, rational controller 
C,+i (z) = 4fl,.(;)+6^(,) «"o Stabilizes ^ for every < e < e . 

Proof Let A(z) := A'p(z)A',.(z) +Dp(z)Z)r(z). The characteristic polynomial, 
Apert(z, e), associated with the perturbed controller, Cf+i(z), is 
izA(z) + (Nr{z)Np{z) + br{z)Dp{z)). If e is treated as a variable in the following 
root locus problem. 



1+e 



N,{z)Np{z)+br{z)Dp{z) 
zHz) 



it follows that there is a e* > such that for all 0<e<e*, the polynomial, 

Apert(z, e), is Schur. D 

The following are consequences of Lemma 4: 

1. If there is a rth order stabilizing controller, then there is a stabilizing controller 
of order r + 1 . Therefore, there is no gap in the order of stabilization. Hence, 
minimal order compensators can be synthesized by recursively reducing the 
order of stabilizing controllers by one. 

2. Let us consider a class of rational proper controllers of the form 

„ , . _ fco+ZciZH h/CrZ'' 

\+kr+2Z+ |-^2rZ''"' +^2r+lZ''' 

and associate it with a vector 

K — {ko,ki, . . .,A:,., 1, . . .,k2r,k2r+l)- 
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Clearly, there is a one-to-one correspondence with K e 5R^''+^ and a rational, 
proper rth order controller Cr{z)- Note that we have fixed the k^+i entry to be 
one. Without any loss of generality, we will use K and Cr{z) interchangeably. 

Let Nr{z)^h + ~k^z^ V~Kz\ and D r{z) ^ I + h+iz ^ h^2rz''"^ + 

kir+xz'^, SO that, by Lemma 4, there is a e* such that for all < e < e*, the following 
(r + l)th order controller, Cr+\{z), is also stabilizing: 

^ ^ h + (fci + \h;)z + ■ ■ ■ + (i + ^.fc.-Oz^ + ^fc,z^+i 

'^' 1 + (^,-+2 + i)z + • • • + (Wl + 7-^2^)2^ + \k2r+lZ'-+' ' 

Hence the associated vector, K e 3?^'+^, 

k{e)= [~kQ~ki+-ki),...~kr+-kr-l,-KA,h+2+-kr+\,---~k2r+l+-klr-,-klr+\Z''^^ 

\ e e e e e e 

Define Kq := k{e*),A := j - -p, and let Ki be 

TTi := (0,A:o,fel, . . .,/Cr,0, l,/Cr+2, • • •,^2r,^2r+l)- 

Then, A' = A'q -I- i^i is stabilizing for every 1 > 0, by Lemma 4. Thus, ^ is a ray 
originating at Kq and is in the direction of ^i in the space of parameters of(r+ l)th 
order proper stabilizing controllers. Two things can be inferred from the above: 

(a) If an rth order stabilizing compensator exists, the set of (r + l)th order proper 
stabilizing controller parameters is unbounded. In particular, the set of (r + I)th 
order proper stabilizing controllers contains a ray of the form Kq + XK\ in 
s)^2r+4 j-jj^j. jj stabilizing for every 1 > 0. The converse is in general not true. 

(b) If, by some means, one were to find a ray, {Kq-\-XKi, 1>0}, of proper 
{r + l)st order stabilizing controllers, with ^i having the first and (r + 2)nd 
entry to be zero and kr+i — 1, then it seems likely to recover a lower order 
controller from Ki considering the correspondence between ^i and C(z). 

Note that the class of stabilizing controllers we consider, are controllers whose 
denominators have a constant term {k^+i entry) to be unity. If a stabilizing con- 
troller Cr (z) has a pole at 0, then one can always construct a stabilizing controller 
of the same order without a pole at zero by a slight perturbation. For this reason, 
there is no loss of generality in assuming that Cr{z) has no poles at 0. 

The following theorem provides the conditions for the existence of a lower- 
order controller from the unboundedness of the set of higher-order controllers. 

Theorem 4 A proper controller of order r stabilizing P(z) exists if there exists a 
p G (0, 1) and a ray of proper stabilizing controllers of order r+ 1, namely 
{Kq + XKi, /I > 0}, that place the closed loop poles inside the disk. Dp. 

Proof A controller of order n — \ always exists for a SISO plant of order n. 
Hence, we will assume that r<n — 2. 
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Necessity: Suppose an rth-order proper controller, C{z) stabilizes the plant, P{z) 
and let p, be the roots of the closed loop polynomial. Define p^ :— max(|/7,|) and 
define p :— -^. By Lemma 4, there exists an e*, such that for all 0<e< e*, the 
{r + l)th order controller, 

Cr+i(z) =— — C(z), 

stabilizes the plant P{z) and places the poles of the closed loop inside the disk Dp. 
If C{z) is of the form 

^^'^ l+diZ+--- + dr-lZ'-^ + drZ'" 

then 

fCco + ciz+.-. + czO 



Cr+l{z) 



f(l + <iiz + • • • + dr-iz'-^ + drZ'') + {l+diz+--- + d^z')' 



In the parameter space of (r + l)th order controller, it is of the form, Kq + iKi, 
where 

e e* 

K] = {0,Co,Ci,...,Cr,0, I, di, ...,dr-u dr), 

Kq ^—Ki + (0,...,0,l,du---,dr-udr,0), 

r+1 zeros 

and this ray of controllers, {Ko + aK\ , a > 0} stabilizes the plant P{z) and places 
the closed loop poles inside the disk Dp. 

Sufficiency: Consider a ray of controllers, of order r + 1 as given below: 

XzNc{z)+n:{z) 
''^'-> XzD,iz)+D*izy 

where, 

Nc{z) =CQ + CiZ + C2Z^ H h CrZ', 

N*{z) = eo + eiz + e^z^ + • • • + e,+iz''+S 
Dc(z) = <io + d\z + d^z} + • • • + drz\ 

Dliz) =/o +.hz +f2Z^ + ■■■ +fr+lZ'+'. 

As shown above, we require Dc{z) to be of order r. Suppose this ray of controllers 
C{z, i) of order r+1 stabilize the plant P{z) and place the closed loop poles inside 

the disk Dp. for some /? e (0, 1). If ^(z) = dT)' then, the closed loop characteristic 
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polynomial for the plant P{z) with a controller from the ray (identified by A) may 
be written as: 

A(P(z),a)=azAo(z)+Ai(z), 

where Aoiz) ^ Np{z)N,iz) + Dp{z)D,{z) and A^iz) = N^iz)N:{z) + Dp{z)D*iz). 
Since A{P{z), 1) is Schur for all 1 > 0, it must be true that Ao(z) must be Schur. 

Hence, C{z) — jyj^ is a lower order controller stabilizing the plant P{z). □ 

Example 3 Consider the plant: 

r( \ ^ ^~^ 

^^' z' - 8z2 + 19z - 12- 

A first order controller of the following form is considered: 

The characteristic polynomial is 

kiz'^ + (1 - m)z^ + (^1 - 8 + I9k3)z^ + {~2ki + 19 + ^2 - I2k3)z - 12 - 2^2- 

The Chebyshev polynomials are: 

R{u) = ik^u* + (-4 + 32kn,)u^ + {2ki - 16 + 3Qk^)u^ 

+ {2ki - yfe2 - 16 - 12^3)" + (-2yt2 ~ h - 18/^3 - 4), 
rT{u) ^ -ikiw' + (4 - 32k2)u^ + (16 - 2ki - 3Aki)u + 18 - Akj, - 2ky + ki. 



In compact form, the Chebyshev polynomials can be represented as 

-4 -1 -2 -18 
16 2 -1 -12 
R{u,K)^ [l 



T{u, K)^[\ u u^ u^] 



-16 2 
-4 


18 -2 1 -4 " 

16 -2 -34 

4 -32 

0-8 



30 

32 



1 

ki 
ki 
ki. 



Outer and inner approximations of the set of controllers are shown in Figs. 17.5 
and 17.6 respectively. The outer approximation is bounded and hence the minimal 
order of stabilizing controller for the given plant is one. 
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Fig. 17.5 Solution for Example 3: an outer approximation 
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Fig. 17.6 Solution for Example 3: an inner approximation 
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17.5 Conclusions 

In this paper, we considered the problem of synthesis of fixed order and structure 
controllers, where the coefficients of the closed loop characteristic polynomial are 
linear in the parameters of the controller. A novel feature of this paper is the 
systematic exploitation of the interlacing property of Schur polynomials and the 
use of Poincare's generalization of Descartes' rule of signs to generate LPs in the 
parameters of a fixed order controller. The feasible set of any LP generated for an 
inner approximation of the set of all stabilizing controllers, can be indexed by a set 
of 2« — 2 increasing values, —l — U()<ui<U2<--- <M2n-2<W2«-i = 1; in par- 
ticular, any controller in the feasible set of LPs places the roots of the Chebyshev 
polynomials of P{z,K) alternately in the intervals (m,-,m,+i), i — 0, . . .,2n — 1. 
The problem of inner approximation of the set of stabilizing controllers is then 
posed as the search for all sets of ordered 2n — 2-tuples of points for which the 
associated LP is feasible; the union of all feasible LPs is an inner approximation 
for the set of all stabilizing controllers. The proposed methodology naturally 
extends to the computation of the set of simultaneously stabilizing controllers. We 
also show that the set of proper stabilizing controllers of order r is not empty and is 
bounded iff r is the minimal order of stabilization for the plant. We provide 
examples to illustrate some of the results. 



References 



1. Ackennann J (1993) Robust control systems with uncertain physical parameters. Springer, 
Berlin 

2. Bengtsson G, Lindahl S (1974) A design scheme for incomplete state or output feedback with 
applications to boiler and power system control. Automatica 10:15-30 

3. Bernstein D (1992) Some open problems in matrix theory arising in linear systems and 
control. Linear Algebra Appl 162-164:409^32 

4. Bhattacharyya SP, Keel LH, Howze J (1988) Stabilizability conditions using linear 
programming. IEEE Trans Automat Contr 33:460^63 

5. Blondel V, Gevers M, Lindquist A (1995) Survey on the state of systems and control. 
European J Contr 1:5-23 

6. Brasch EM, Pearson JB (1970) Pole placement using dynamic compensator. IEEE Trans 
Automat Contr AC- 15:34^3 

7. Buckley A (1995) Hubble telescope pointing control system design improvement study. 
J Quid Control Dyn 18:194-199 

8. Datta A, Ho MT, Bhattacharyya SP (2000) Structure and synthesis of PID controllers. 
Springer, London 

9. El Ghaoui L, Oustry F, AitRami M (1997) A cone complementarity linearization algorithm 
for static output feedback and related problems. IEEE Trans Automat Contr 
42^8:1171-1176 

10. Goodwin GC, Graebe SF, Salgado ME (2001) Control system design. Prentice-Hall, Upper 
Saddle River 

11. Henrion D, Sebek M, Kucera V (2003) Positive polynomials and robust stabilization with 
fixed-order controllers. IEEE Trans Automat Contr 48(7):1178-1186 



17 Fixed Structure Controllers for Discrete Time Systems 385 

12. Iwasaki T, Skelton RE (1995) The XY-centering algorithm for the dual LMl problem: a new 
approach to fixed order control design. Int J Contr 62(6): 1257-1272 

13. Keel LH, Rego Jl, Bhattacharyya SP (2003) A new approach to digital PID controller design. 
IEEE Trans Automat Contr 48(4):687-692 

14. Malik WA, Darbha S, Bhattacharya SP (2004) On the synthesis of fixed structure controllers 
satisfying given performance criteria. In: 2nd IF AC Symposium on System, Structure and 
Control 

15. Malik WA, Darbha S, Bhattacharyya SP (2007a) On the boundedness of the set of stabilizing 
controllers. Int J Robust Nonlinear Contr, page in print 

16. Malik WA, Darbha S, Bhattacharyya SP (2007b) A linear programming approach to the 
synthesis of fixed structure controllers. IEEE Trans Automat Contr, page in print 

17. Polya G, Szego G (1998) Problems and theorems in analysis 11 — theory of functions, zeros, 
polynomials, determinants, number theory, geometry. Springer, Berlin 

18. Syrmos VL, Abdullah CT, Dorato P, Grigoriadis K (1997) Static output feedback — a survey. 
Automatica 33(2):125-137 

19. Zhu G, Grigoriadis K, Skelton R (1995) Covariance control design for the Hubble space 
telescope. J Guid Control Dyn 18(2):230-236 



Chapter 18 

A Secant Method for Nonlinear Matrix 

Problems 

Marlliny Monsalve and Marcos Raydan 



Abstract Nonlinear matrix equations arise in different scientific topics, such as 
applied statistics and control theory, among others. Standard approaches to solve 
them include and combine some variations of Newton's method, matrix factoriza- 
tions, and reduction to generalized eigenvalue problems. In this paper we explore the 
use of secant methods in the space of matrices, that represent a new approach with 
interesting features. For the special problem of computing the inverse or the 
pseudoinverse of a given matrix, we propose a specialized secant method for which 
we establish stability and q-superlinear convergence, and for which we also present 
some numerical results. In addition, for solving quadratic matrix equations, we 
discuss several issues, and present preliminary and encouraging numerical 
experiments. 

18.1 Introduction 

The aim of this paper is to present a secant method for solving the following 
matrix nonlinear problem: 

given F : C"""" -^ C" find X^ e C"""' such that F{X^) = 0, (18.1) 
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where F is a Frechet differentiable map. In what follows, we denote by F' the 
Frechet derivative of F. 

This problem appears in different applications. For instances, given the matrices 
A,Z?, and C, the quadratic matrix equation AX^ + BX + C = arises in control 
theory [1, 2, 5, 9]. Another application is to compute the inverse or the pseudo- 
inverse of any given matrix A. Indeed, if we find the root of F{X) — X^^ — A, we 
obtain the inverse of A, and iterative schemes based on Newton's method can be 
applied for finding the inverse or the pseudoinverse of any matrix [13, 15]. For 
additional applications and further results concerning nonlinear matrix problems 
see [8, 11]. 

A useful tool for solving Eq. (18.1) is the well-known Newton's method: 

Algorithm 1: Newton's method 

Given Xq 6 C"^"; 
Forfc = 0, 1,--- 

Solve F'{Xk)Sk = -F{Xk) /HoiSk*/ 

Set Xk+i = Xk + Sk 
End For 

Note that we need F' to find Sk in each step of Algorithm 1 and in order to 
obtain F' we can use the Taylor series for F about X, F{X + S) = F{X) + 
F'{X)S + R{S), where R{S) is such that 

IISIHO \\S\\ 

The Taylor series allows us to identify the application of F'{X) on S which is 
required to solve the linear equation of step 1 in Algorithm 1. In many cases, 
solving that linear equation is computationally expensive (see, e.g., [16]). 

As a general principle, whenever a Newton's method is applicable, a suitable 
secant method can be obtained, that hopefully has interesting features to exploit. 
For example, in the well-known scalar case, the secant method does not require the 
derivative, and only uses function evaluations. In that case (/ : C ^ C) , the secant 
method can be written as follow: 

Xk+l — Xk , 

at 

where a^ satisfies that/(xi.) =f{xk-i) + ak{xk — xif-\) for ^>0, and x_i,xo G C 
are given. 

Moreover, an extension for nonlinear systems of equations {F : C" -^ C"), can 
be written as 

Xk+i =Xk -AkF{xk) 
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where Ai G C"^" satisfies that f(xt) = /^(x^._i) + Aj;(jt,t — x/._i) for A: > 1 , and the 
vector xq and the matrix Aq are given. There are infinitely many options for 
building the matrix Aj^ at every iteration. In particular, the Broy den's family of 
quasi-Newton methods avoids the knowledge of the Jacobian and has produced a 
significant body of research for many different problems (see, e.g., [4, 12]). 

In this work we develop secant methods for nonlinear matrix problems that 
inherit, as much as possible, the features of the classical secant methods in pre- 
vious scenarios (e.g., scalar equations, nonlinear algebraic systems of equations). 
The rest of this document is organized as follows. In Sect. 18.2 we propose a 
general secant method for matrix problems which is based on the standard secant 
method, and we also describe some of its variations. In Sect. 18.3 we propose a 
specialized secant method for approximating the inverse or the pseudoinverse of a 
matrix. The global convergence and the stability are proved for this specialized 
secant method. We present numerical experiments for computing the inverse of 
some given nonsingular matrices, and for computing the pseudoinverse of a sin- 
gular matrix. In Sect. 18.4 we consider the application of the general secant 
algorithms for solving quadratic matrix equations, and we also present some 
encouraging preliminary numerical results. Finally, in Sect. 18.5, we present some 
conclusions and perspectives. 



18.2 A Secant Equation for Matrix Problems 

A general secant method for solving (18.1) should be given by the following 
iteration 

Z,+i=X,-A,-iF(Z,), (18.2) 

where X^i G (^nxn ^^^ ^^ ^ £.nx« ^j.g given, and A^+i is a suitable linear operator 
that satisfies 

Ak+xSk = Yk, (18.3) 

where S^ — X^+i — X^; and 7^, = F{Xi^+i) — F{Xi.). Equation (18.3) is known as 
the secant equation. 

Once Z;t+i has been obtained, we observe in (18.3) that A^+i can be computed 
at each iteration by solving a linear system of n^ equations. Therefore, there is a 
resemblance with the scalar case, in which one equation is required to find one 
unknown. Similarly, we notice that one « x n matrix is enough to satisfy the 
matrix secant Eq. (18.3). Hence, we force the operator A^ to be a matrix of the 
same dimension of the step Su and the map-difference Y^^ as in the scalar case. 
The proposed algorithm, and some important variants, can be summarized as 
follows: 
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Algorithm 2: General secant method for matrix problems 

Given X_i G C"^", Xq G C"^" 
Set S-i = Xq — X_i 
Set y_i = F{Xo) - F(X_i) 

Solve AqS-i = y_i /*for Aq*/ 

For fc = 0, 1, ■ • • until convergence 

Solve AkSk = -F{Xk) riorSkV 

Set Xk+i = Xk + Sk 

Set Yk^F{Xk+i)-F{Xk) 

Solve A,+i5fc = n /*for Ak+i*/ 

End For 

We can generate the sequence B^ = A^\ instead of A^, and obtain an inverse 
version that solves only one linear system of equations per iteration: 

Algorithm 3: Inverse secant method 

Given X_i G C"^'", Xo G C"^" 
Set S-i = Xq — X-i 
Set y_i = F{Xo) - F{X^i) 

Solve BqY^i = S^i /*for Bq*/ 

For fc = 0, 1, • • • until convergence 

Set Sk = -BkF{Xk) 

Set Xk+i = Xk + Sk 

Set Yk = F{Xk+i)-F{Xk) 

Solve Bk+iYk = Sfc /*for Bfc+i*/ 

End For 

Solving a secant method that deals with n x n matrices is the most attractive 
feature of our proposal, and represents a sharp contrast with the standard extension 
of quasi-Newton methods for general Hilbert spaces, (see e.g. [6, 14]), that in this 
context would involve n^ x rp- linear operators to approximate the derivative of F. 
Clearly, dealing with n x n matrices for solving the related linear systems sig- 
nificantly reduces the computational cost associated with the linear algebra of the 
algorithm. 

In order to discuss some theoretical issues of the proposed general secant 
methods, let us consider the standard assumptions for problem (18.1): F : C"^" -^ 
C"^" is continuously differentiable in an open and convex set D C C"^". There 
exists X^, G c"^" and r > 0, such that N{Xt,r) C D is an open neighborhood of 
radius r around X^,, F{Xt) = 0, and F'(Z*) is nonsingular, and F'{X) G 
Lip.,(A^(Z», r)), i.e., F'{X) is a Lipschitz continuous function with constant y > 
in N{X^,r). 

We begin by noticing that the operator A^ does not approximate F'{Xi.) as in 

previous scenarios due to dimensional discrepancies. Indeed, F'{Xk) G C" ^" and 
Ak G C"""". However, fortunately, F'{Xk)Sk and A^Sk both live in C"''", which 
turns out to be the suitable approximation since, using the secant equation (18.3), 
we have that 
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At+iSt = Yk= F{Xk+i) - F{Xk) = F'{Xk)Sk + R{St). (18.4) 

Subtracting F'{X^)Si: in both sides of (18.4), and taking norms we obtain 

\\Ak+iSt-F'iX,)Sk\\ < \\F'{Xk)-F'iX,)\\\\Sk\\ + ||/?(5,)||, 

for any subordinate norm || • ||. Using now that F'{X) G Lip„{N{X^,r)), and 
dividing by \\Sk\\ we have 

\\A,+,S,~F'iX,^)S,\\ ^ IIpImMM ns.^ 

HTTT <y\\Fk\\ + ||„ II , (18.5) 

where E^ = X^— X^ represents the error matrix. 

From this inequality we observe that, if convergence is attained, the left hand 
side tends to zero when k goes to infinity, and so the sequence {Ak}, generated by 
Algorithm 2, tends to the Frechet derivative, F'{X^)^ when they are both applied to 
the direction of the step Sk- Concerning local convergence, we have from Step 7 in 
Algorithm 2 that 

Ek+\ = Ek - A/^ F{Xk) 

= Ek-A-,'F\X,)Ek-0{El), 

which impUes that 

\\Ek+i\\<\\Ek ~Al'{F'{X,)Ek)\\+0{\\Ek\\^). (18.6) 

Consequently, if A^ in our secant algorithms is such that A]^^ {F' (Xt)Ek) approx- 
imates Ek in a neighborhood of X», as expected, then Hiit+ill is reduced with 
respect to \\Ek\\. Inequalities (18.5) and (18.6), somehow, explain the convergence 
behavior we have observed in our numerical results. In our next section, though, 
we will establish formally the stability and also the local and q-superlinear con- 
vergence of the proposed secant methods for the special case of computing the 
inverse or the pseudoinverse of a given matrix. 



18.3 Special Case: Inverse or Pseudoinverse of a Matrix 

For computing the inverse of a given matrix A we will consider iterative methods 
to find the root of 

F{X)=X-^-A, (18.7) 

and for the sake of clarity let us assume, for a while, that A is nonsingular. 
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Newton's method from an initial guess Xq, for solving (18.7), also known as 
Schulz method [15], is given by 

Xu+x=2Xk-XkAXk. (18.8) 

It has been established that if Xq — jf^, then Schulz method possesses global 

convergence [7, 17]. Moreover, if A does not have an inverse, it converges to the 
pseudoinverse (also known as the generalized inverse) of A [7, 8, 17]. 
First, let us consider the general secant method applied to (18.7) 

Xu+x = Xk - Sk-i{F{Xu) - F{Xu-x))-'F{Xk) 

^X,- {Xu-X,_,){X-,'~X-,l,)-\x^' -A). (18.9) 

Let us assume that A is diagonalizable, that is, there exists a nonsingular matrix V 
such that 

V-'AV = A = diag{luh,.-;l,), 

where /li,/.2, ■ ■ ■,''-n are the eigenvalues of A, and let us define D^ = V^^X^V. 
From (18.9) we have that 

Dk+i =Dk- {V-'Xk - V-'Xk-i)W-\x;' -X^\)-'W-\X;'V-AV) 

= D,~ (D, - D,_i)(^."' - D;\)-\d;' - A). (18.10) 

Note that if we choose Z_i and Xo such that D^i = V'^X^iV and A) = V'^XoV 
are diagonal matrices, then all successive Dk are diagonal too, and in this case 
DiDj = DjD, for all ij. Therefore (18.10) can be written as 

Dk+i =Dk- {Dk-Dk-x){D-,'D-,\Du-x - D;\D;'Dky\Di^' - A) 
= D, - (A -D,_i)(D,-'D,-^iA_i ~D;'d;1^D,)-\d^' - A) 
= D,-(£)t-D,_i)((DnZ),)-'(D,_i~A))-'(D,-i-A) 
= Dk + [D, - D,_,){D, - D,_,)-\Du-iD,){Dl' - A) 
= Dk-x+Dk-Dk-xADk. (18.11) 

Motivated by (18.1 1) we now consider the specialized secant method for (18.7), 
Xk+\ —Xk-\ +Xk —Xk-\AXk, (18.12) 

that avoids the inverse matrix calculations per iteration associated with iteration 
(18.9). Notice the resemblance between (18.12) and Schulz method for solving 
the same problem. Therefore, in what follows (18.12) will be denoted as the 
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secant-Schulz method. Our next result establishes that if A is diagonalizable and 
the two initial guesses are chosen properly, then the secant-Schulz method con- 
verges locally and q-superlinearly to the inverse of A. 

Theorem 1.1 Let A e C^" jyg ^ nonsingular diagonalizable matrix, that is, there 
exists a nonsingular matrix V such that 

y-'Ay = A = diag(li,i2,...,^), 

where /,i, I2, ■ • ■, -^n cire the eigenvalues of A. Let X_i and Xq be such that 
y^'X-iV and V^'^XqV are diagonal matrices. Then the secant-Schulz method 
converges locally and q-superlinearly to the inverse of A. 

Proof Let us define Dk^V-'^XkV for all /t> -1. From (18.12) we have that 

Dk+i = Dk-i + Dk ~ Dk-iAD^. (18.13) 

Since D_i and Do are diagonal matrices, then all successive D^ are diagonal too, 
and in this case DjDj — DjDi for all ij. Moreover, since Dj = dia.g{dl,dl, . . .^df) 
we see from (18.13) that 

4^j = 4_j + d[ - 4_i4/'./, for all 1 < (• < n, (18. 14) 

where (18.14) represents n uncoupled scalar secant iterations converging to 
l/''-n 1 < ' <M. Indeed, subtracting l/i,- in both sides of (18.14) and letting ej, = 
4 — 1/^-i we have that 

4+1 =4 + 4-1 -4-i4^/-iA 

= -M44-1 - 4A - 4-1 A + i/^f) 
= -i,.(4-iA)(4-i-iA) 

= -l44-i- (18-15) 

From (18.15) we conclude that each scalar secant iteration (18.14) converges 
locally and q-superlinearly to l/i,. Therefore, equivalently [4], there exists a 
sequence {c^}, for each !<('<«, such that c^ > for all k, lim^^^oc c^ — 0, and 

l4+il<4l4l- (18.16) 

Using (18.16) we now obtain in the Frobenius norm 



|D,+,-A-'ii^ = £(4+,)'<E(4)'(4)' 

1=1 i=l 

<nclY,{e[f <n^l\\Dk- ^-'\\l, (18.17) 



1=1 
where c^, = maxi<,<„{4}. 
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Finally, we have that 




\\Xk+i-A-' 


\\p = \\W-\Xk+i-A-')W-'\\p 




= \\V{D,+i~A-')V-'\\p 




<KF{V)\\Dk+i~A-'\\p 




<KFiV)y^?k\\Dk~A-^\\p 




_ ., /'T/\ /7:^\\\r-l\rfr>. A-Ut^-I 



<KF{vfV^dt\\Xk-A-'\\p, (18.18) 

where Kf{V) is the Frobenius condition number of V. Hence, the secant-Schulz 
method converges locally and q-superlinearly to the inverse of A. D 

When A has no inverse, we can prove that the secant-Schulz method converges 

locally and q-superlinearly to the pseudoinverse of A, denoted by AL For this case, 
let A G C™*^" be a matrix of rank r, and let us assume that its singular value 
decomposition is given by 

A.C/(J ly, (18.19) 

where U € C""^'",V € C"""" are unitary matrices and E = diag(cri, (T2, • • ■, cfr) 
where CTi, (T2, ■ • •, f/- are the singular values of A. 

Corollary 1.2 Let A e C"^" t>e a matrix of rank r, and let X_i and Xq be such 

that V*X-\U = pT ^ ) and V*X()U = „ „ I where V* ,U are defined in 

(18.19) and D_i,D() e C^^' are diagonal matrices. Then the secant-Schulz method 
converges locally and q-superlinearly to the pseudoinverse of A. 

Proof From iteration (18.12) and defining i^J" V] ^ V*XkU, with Dk G C"', 
we have that 

'Dt+i Q\ _ ( Dk-x + Dk - Dk-i^Dk 



Since X_i and Xq are such that D i and Do are diagonal matrices then, using the 
same arguments as in the proof of Theorem 1.1, we obtain that 

Dk+x=Dk-x+Dk-Dk-x^Dk, 

represents r uncoupled scalar secant iterations that converges locally and 
q-superlinearly to 1/cr,, 1 < / < r, that is, 

\\D,+,-Y.-'\\l<r-?:l\\D,-^-'\\l, (18.21) 



18 A Secant Method for Nonlinear Matrix Problems 395 

where c<. = maxi<,<r{cj,} and the sequences {c[,} are such that cj. > and 
lim,t^oo c'f. = for each < (' < r. Finally using the same arguments used for 
obtaining (18.18) we have that 

\\Xk+i - At||^ = ||yy*(x,+i ~A^)uu*\\p 




< \/mn\\Di 



\F 



Dk 



<mn^/?Ck\\Xk-A^\\p. □ 



It is important to note that Theorem 1.1 implies the well-known Dennis-More 
condition [3, 4] 

*— |j5,|| 

that establishes the most important property of the sequence {A^} generated by the 
secant-Schulz method. 

We now discuss the stability of our specialized secant method for the inverse 
matrix. First, let us recall the suitable definition from [8]. The fixed point iteration 
Yk+\ = G{Yk) is stable in a neighborhood of a fixed point 7, if the Frechet 
derivative G'(Kj) has bounded powers. 

Theorem 1.3 The secant-Schulz method generates a stable iteration. 

Proof The secant-Schulz method, as a fixed point iteration, can be obtained 
setting 






Vk+l — 1 V I J ^* — 1 ^-1 



and 



C(Y \ — C' ^'^ 1 — I Xk-i +Xk — Xk-iAX/i 



Xk-i J \ Xii 

Therefore, the map we need to study is given by 

W\ fZ+W~ZAW\ 



^^ zj- \ w J- 
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for W and Z in C"^". Now we will use Taylor series for identifying G': 

GiY + P)^GiY) + G'iY)P + R{P), (18.22) 

where P = {E\,E2) , E\ and £2 are perturbation matrices, and R is such that 

WHO ||P|| 

We have that 

^ -W + fiA ^ / {{Z + E2) + [W + E,)) - [Z + E2)A{W + E,)\ 
Z + E2) \ W + Ei ) 

{Z+W - ZAW) + (£1 + £2 - ZAEi - E2AW) - E2AEi 
W + Ei 



(18.23) 



Comparing Eqs. (18.22) and (18.23) we conclude that 



^,^^^p^^E,+E2~ZAE,-E2AW\ ^^^^4) 



When y = y, = {A-\A-^Y from (18.24) we obtain that 

G'{Y,)P ^ 



0\ /O \ /£i 



El J \I 0J\E2 

Therefore, G'(F») is an idempotent matrix, and the iteration is stable. D 

We now present a set of experiments to compute the inverse of a given matrix 
using the secant-Schulz iterative method. We choose as initial guesses X_i = 
aA^/||A||2 and Xq ^ pA^ /\\A\\l, with a,/?e (0,1]. When A is symmetric and 
positive definite we can also choose Z_i = a/ and Xq — pi with a > and /? > 0. 
For these initial choices global convergence can be established. Indeed, from 
(18.12) we obtain that 

||/-AZ,+i|l2<|l/-AX_,|l^i/-AXo|l^' (18.25) 

where oci^ > and /J^j. > 0. Note that the matrices / — AZ_i and I — AXq are 
symmetric, and for this case (18.25) can be written as 

\\I-AXk+i\\2<piI-AX^irpil-AXo)^' 

where p{B) represents the spectral radius of the matrix B. Finally using similar 
arguments to the ones used to prove the global convergence of Schulz method 
[7, 17], we can prove that p{I - AX_i) < 1 and p{I - AXq) < 1. 
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In our implementation we stop all considered algorithms when 

\\Xk~X4/\\X4<0.5D~14. 

All experiments were run on a Pentium Centrino Duo, 2.0 GHz, using Matlab 7. 
We report the number of required iterations (Iter) and the relative error 
(||X,t — X»||/||X»||) when the process is stopped. For our first experiment we 
consider the symmetric and positive definite matrix poisson from the Matlab 
gallery with n — 400. We compare the performance of the secant-Schulz method 
with the Newton-Schulz method described in (18.8). For the secant-Schulz method 
we choose X_i =0.5*/, and Xq = A^/||A||2, and for the Newton-Schulz we 
choose the same Xq. We report the results in Table 18.1, and the semilog of the 
relative error in Fig. 18.1. 

For our second experiment we consider the nonsymmetric matrix grcar from 
the Matlab gallery with n = 200. For the secant-Schulz method we choose 



Table 18.1 Performance of secant-Schulz and Newton-Schulz for finding the inverse of A - 

gallery { 'poisson' ,20) when n = 400, X_i =0.5*/, and Xq =A^/||/1||2 

Method Iter \\Xt - X._,\\ / \\X, 



Secant-Schulz 
Newton-Schulz 



18 

22 



1.95e- 
1.87e- 



15 
15 




10 15 

Iterations 



Fig. 18.1 Semilog of the relative eiTor for finding the inverse of A = gallery ( 'pois- 
son' ,20) whenn = 400,X_i =0.5*/, and Xo =A^/\\A\\l 
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Table 18.2 Performance of secant-Schulz and Newton-Schulz for finding the inverse of A = 

gallery! 'grcar' ,200) with X_i = 0.2* A^ /\\A\\l and Xq = A^ /\\A\\l 



Method 








Iter 






m - 


x4/\ 


1^*11 


Secant-Schulz 
Newton-Schulz 








14 
10 






2.69e- 

4.32e- 


-15 
-16 




Table 18.3 Performance of secant-Schulz and Newton-Schulz for finding the pseudo- 

A= gallery! 'cycol' ,n,8) with X_i = 0.2 * A^/||A||2 and Xq =A^/||Aj|2 


-inverse of 


Method 








Iter 






i|X*- 


X,\\/\ 


1^.11 


Secant-Schulz 
Newton-Schulz 








9 
8 






1.85e- 
1.86e- 


-15 
-15 





X_i = 0.2>kA^/||A||2, and X^) ^ A'^ / \\A\\l, whereas for the Newton-Schulz we 
choose Xq = A^/IJAllj. We compare the performance of the secant-Schulz method 
with the Newton-Schulz. We report the results in Table 18.2. 

As in the Newton-Schulz method, the secant-Schulz method also converges to 
the pseudo-inverse of any given matrix. For our next experiment we consider the 
rectangular matrix cycol from the Matlab gallery with n — [100 10] to compute 
its pseudoinverse, starting from the same initial choices of the previous experi- 
ment. We report the results in Table 18.3. 

In all the experiments we observe the typical q-superlinear behavior of the 
proposed secant-Schulz method as compared with the q-quadratic behavior asso- 
ciated with the Newton-Schulz method. 



18.4 Quadratic Matrix Equation 

We now consider the application of Algorithms 2 and 3 for solving quadratic 
matrix equations of the form AX^ + BX + C = 0, where A, B, and C are n x n 
matrices. For a recent globalized implementation of Newton's method see [10]. 
For our secant algorithms we set F{X) = AX^ + BX + C and seek roots of F. For 
this special case, the general secant algorithm can be simplified as follows: 

Algorithm 4: Secant method for quadratic problems 

Given X_i G C"><", Xq e C"^" 
Set S-i =Xo- X_i 

Solve WoS-i = A{X^ - Xl^) /*for Wo*/ 

Set Ao = Wo + B 
For fc = 0, 1, • • ■ until convergence 

Solve AkSk = -F{Xk) /*for Sk*/ 

Set Xk+i = Xk + Sk 

Solve Wk+^ Sk = A{Xl^, - Xl) /Hot Wk+i */ 

Set Afc+i = H^fc+i -F B 
End For 
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and the inverse version of the secant algorithm can be written as follows: 
Algorithm 5: Inverse secant method for quadratic problems 

Given X_i e C"^", Xq e C"^" 
Set S^i =Xo- X_i 
Set r_i = A{Xi - Xl^) + B{Xo - X_i) 

Solve BqY.i = S-i /*for Bq*/ 

For fc = 0, 1, ■ • • until convergence 

Set Sk = -BkF{Xk) 

Set Xk^i = Xk + Sk 

Set Yk = A{Xl^^ - Xl) + BSk 

Solve Bk+iYk = Sk /*for B^+i* / 

End For 

We now present some experiments to illustrate the advantages of using Algo- 
rithms 4 and 5 for solving quadratic matrix equations. For that we choose two 
examples already studied and described in [9, 10]. We choose as initial guesses 
X_i = 0.1/ and Xq = f^I, as in [2] for Newton's method, where 



P=(\\B\\, + .J\\B\\l + 4\\AUC\\,j/i2\\A\\,}. 

In our implementation we stop the algorithms when Res(X^) <n * eps, where 
Res(X,) = ||/'(X,)||;./(|!A||^||Z,||2 + ||B||^||Xi||^ + ||C||;,) and eps = 2.2D - 16. 
This stopping criterion is also suggested in [10]. These experiments were also run 
on a Pentium Centrino Duo, 2.0 GHz, using Matlab 7. We report the number of 
required iterations (Iter) and the value of Res(Xi) when the process is stopped. For 
our first experiment we consider the problem described by the following matrices 
A^L 



Problem (18.26) has a solvent at X» = /. 

Table 18.4 Performance of secant and inverse secant for solving problem (18.26) 

Xq Method Iter Res(Xi) 

jS/ Secant 10 4.15e-17 

pi Inverse secant 11 2.22e— 17 

10/ Secant 13 2.22e-17 

10/ Inverse secant 14 3.14e— 17 

lOV Secant 15 1.57e-17 

10''/ Inverse secant 16 5.02e— 19 

10'"/ Secant 15 2.74e-I9 

10'"/ Inverse secant 16 2.22e-17 
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Table 18.5 Performance of secant and inverse secant for solving our second quadratic 
experiment 



Xo 



Method 



Iter 



Res(Xt) 



Pl 

10-7 

10-7 

10 V 

10 V 

10'"/ 

10'"/ 

lO^O/ 

102°7 



Secant 

Inverse secant 
Secant 

Inverse secant 
Secant 

Inverse secant 
Secant 

Inverse secant 
Secant 
Inverse secant 



12 
18 
15 
18 
17 
17 
18 
16 
15 
17 



1.62e-14 

9.93e-I5 

3.76e-15 

I.23e-I4 

1.92e-I5 

2.2e-14 

1.7Ie-I5 

7.55e-I5 

1.62e-I4 

2.05e-14 



We compare the performance of the direct secant method (Algorithm 3) with 
the inverse secant method (Algorithm 3). We report the results in Table 18.4. 

For our second experiment we consider the problem described in [9] which is 
given by the following matrices A = /, B = tridiag[— 10, 30, — 10] except 
B{\, 1) = B{n,n) = 20, and C = tridiag[-5, 15, -5], for n = 100. We compare 
the performance of the direct secant method (Algorithm 4) with the inverse 
secant method (Algorithm 5). We report the results in Table 18.5 and Fig. 18.2. 
We observe in both experiments that the secant algorithms show a robust 
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Fig. 18.2 Semilog of the relative residual for solving our second quadratic experiment when 
n = 100 and Xo = lO'/ 
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behavior converging from initial guesses either close or far away from the 
solution. In contrast, as reported in [10], Newton's method requires an exact line 
search globalization strategy to avoid the increase in number of iterations for 
convergence. 



18.5 Conclusions and Perspectives 

Whenever a Newton's method is applicable to a general nonlinear problem, a 
suitable secant method should be obtained for the same problem. In this work we 
present an interpretation of the classical secant method for solving nonlinear 
matrix problems. In the special case of computing the inverse of a given matrix, 
we present and fully analyze a specialized version, the secant-Schulz method, that 
resembles the well-known Schulz method which is a specialized version of 
Newton's method. 

For solving quadratic matrix problems, we explore the use of the direct and also 
the inverse secant method. Our preliminary numerical experiments show the 
expected q-superlinear convergence, and indicate that these secant schemes seems 
to have interesting properties that remain to be established. 

Finally, we hope that our specialized secant methods, for solving some simple 
cases, stimulate further extensions and research for solving additional and more 
complicated nonlinear matrix problems. 
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Chapter 19 

On FastICA Algorithms and Some 

Generalisations 

Hao Shen, Knut Hiiper and Martin Kleinsteuber 



Abstract The FastICA algorithm, a classical method for solving the one-unit 
linear ICA problem, and its generalisations are studied. Two interpretations of 
FastICA are provided, a scalar shifted algorithm and an approximate Newton 
method. Based on these two interpretations, two natural generalisations of FastICA 
on a full matrix are proposed to solve the parallel linear ICA problem. Specifically, 
these are a matrix shifted parallel ICA method and an approximate Newton-like 
parallel ICA method. 



19.1 Introduction 

In recent years, there has been an increasing interest in applying numerical linear 
algebra tools to either analyse existing methods or develop new methods in signal 
processing. In this paper, we present our recent results in analysing and general- 
ising a classic algorithm for doing linear Independent Component Analysis (ICA), 
which is now a standard statistical tool for the problem of Blind Source Separation 
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(BSS), a challenging problem in signal processing. The tools we utilise in our 
analysis are similar to the techniques in analysing the Rayleigh quotient iteration 
(RQl), which is a well known method to the numerical linear algebra community 
for computing an eigenvalue eigenvector pair of a real symmetric matrix. 

Since the seminal paper by Comon [6], many ICA algorithms have been 
developed by researchers from various communities. The FastICA algorithm, 
proposed by the Finnish school, is a classic linear ICA algorithm with both good 
accuracy and fast speed of convergence, see [12]. It solves the so-called one-unit 
linear ICA problem, which extracts one signal per pass. Originally, FastICA was 
developed as a Newton type method using a Lagrange multiplier approach together 
with a heuristic approximation of Hessians. 

The first contribution of this work is to give two new rigorous interpretations of 
FastICA. First of all, FastICA can be easily considered as a scalar shifted version 
of a simpler linear ICA algorithm. It can be shown that the scalar shift strategy 
utilised in FastICA accelerates the simpler algorithm to achieve local quadratic 
convergence. Alternatively, by using geometric optimisation techniques, we 
develop an approximate Newton one-unit linear ICA method with a sensible 
approximation of Hessians, which has a rigorous justification by statistical features 
of the linear ICA problem. It can be shown that FastICA is indeed a special case of 
our proposed approximate Newton ICA method. 

Due to the great success of FastICA, a natural question one may raise is whether 
it is possible to generalise FastICA to solve the problem of extracting all sources in 
parallel. On one hand, by generalising the concept of a scalar shift strategy to a 
matrix shift strategy, we develop a matrix shifted parallel linear ICA algorithm, 
which is indeed a natural generalisation of FastICA to a full matrix. On the other 
hand, following the same idea of approximating Hessians as in developing the 
approximate Newton one-unit linear ICA method, we formulate an approximate 
Newton-like parallel linear ICA method, which shares the significant property of 
local quadratic convergence, in this case to a correct separation of all sources. 

This paper is organised as follows. In Sect. 19.2, we give a brief statement of 
the linear ICA problem. In Sect. 19.3, the FastICA algorithm is interpreted as 
special cases of a scalar shifted one-unit linear ICA algorithm and an approximate 
Newton one-unit linear ICA method. As natural generalisations of FastICA to a 
full matrix. Sect. 19.4 presents two parallel linear ICA methods in the frameworks 
of matrix shifted algorithm and approximate Newton-type method. Performance of 
all presented linear ICA methods is demonstrated and compared by several 
numerical experiments in Sect. 19.5. Finally, a conclusion is made in Sect. 19.6. 



19.2 Problem Statement 

In the literature, blind source separation (BSS) refers to the problem of recovering 
signals only from linear mixtures of sources, in the absence of prior information 
about either the sources or the mixing process. It has enormous applications in 
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bioinformatics, telecommunications, speech recognition systems, and so on. We 
refer to [5, 14, 20] and references therein for further details. 

A noiseless linear BSS model is generally formulated as follows, refer to [12] 
for more details, 

z^As, (19.1) 



where i' = [^i, . . .ji'm] e M™ denotes an m-dimensional random vector repre- 
senting m source signals, A e W"^'" is the mixing matrix of full rank and z € M'" 
represents m observed linear mixtures. The task of linear BSS problem (19.1) is to 
recover the source signals s by estimating the mixing matrix A or its inverse A"' 
based only on the observations z via the demixing model 

y = Bz, (19.2) 

where B e R'"^'" is the demixing matrix, an estimation of A"' and y G M"' rep- 
resents the corresponding estimated source signals. 

It is clear that, without further constraints, there are infinitely many possibilities 
of B for the demixing BSS model (19.2). Therefore, to allow a solution of the BSS 
problem, one would have to impose certain prior assumptions about either the 
mixing process or the features of sources. One most common assumption is the 
concept of statistical independence, i.e.. 

Assumption 1 All individual components of the sources s G M™ as in model 
(19.1) are mutually statistically independent. 

It then leads to the standard linear ICA approach for solving the linear BSS 
problem (19.1), refer to [6] for details. 

It is obvious that, for the linear model (19.1), any scaling factors of both the 
columns of A and the components of the sources s are indeed interchangeable, i.e., 
the sign and amplitude of each source cannot be uniquely identified. Moreover, the 
mean value of each source signal is irrelevant to the concept of statistical inde- 
pendence. Therefore, without loss of generality, each component of the sources s 
can be assumed to have zero mean and unit variance. 

In practice, the observations z can always be whitened to simplify the problem 
further. Usually by using principal component analysis (PCA), one computes a 
matrix C G R"""" such that 



w 



Cz = CAs, where E[h'w^] = /„, (19.3) 



i.e., all components of the whitened mixtures w are uncorrelated and each com- 
ponent of w has zero mean and unit variance as well. By denoting the product 
CA =: y G M"""" in (19.3), one observes 
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E[ww'^] = VE[ss'^]V^ 

^=^ (19.4) 

i.e., V e R™^™ is indeed an orthogonal matrix. The linear relation (19.3) is often 
referred to as the whitened linear ICA model. Let 0{m) denote the 
orthogonal group, i.e., 

0{m) := {X e R"''""\X'^X ^ /,„}. (19.5) 

The whitened linear ICA demixing model is simply stated as follows 

y^X'^w, (19.6) 

where X e 0{m) is called the demixing matrix as an estimation of y e 0{m) and y 
now represents an estimation of the whitened sources s in (19.3). 

In some applications, one might prefer to extract only one desired source, rather 
than all sources. Let X —[xi, . . .,Xm\ € 0{m). Then a single source can be 
recovered by simply the following 

yt^xjw. (19.7) 

We refer to such a problem as the one-unit linear ICA problem, while we refer to 
the parallel linear ICA problem for problems of simultaneous extraction of all 
sources in the form of (19.6). 

Now let y = [vi, . . ., Vm] € 0{m) be the mixing matrix in the whitened mixing 
ICA model (19.3). Following Theorem 11 in [6], all correct solutions of the 
parallel linear ICA problem (19.6) are given by 

0:={yDPe 6>(m)}, (19.8) 

where Z) = diag(ei, .. .,£„,) with £,■ e {±1} and P = {e^(i), . . .,e^(„,)} G 0(m) a 
permutation matrix with t: {1, . . .,m} ^ {1, . . .,m} being a permutation. 
Straightforwardly, all correct solutions for the one-unit linear ICA problem (19.7) 
form the set 

T:={£v,|y=[vi,...,v,„] GO(m) and£G{±l}}. (19.9) 

One popular category of ICA methods is the so-called contrast-based ICA 
method. A general scheme of such methods involves an optimisation procedure of 
a contrast function, which measures the statistical independence between recov- 
ered signals. Usually, correct separations of sources are expected to be obtained at 
certain optimal points of contrast functions. A simple approach to designing ICA 
contrasts is to construct their statistical independence measure by using some 
parameterised families of functions, in accordance with certain hypothesis about 
the distributions (probability density functions) of sources. The corresponding ICA 
methods are usually referred to as parametric ICA methods. 
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Let us define the m — \ dimensional unit sphere by 

5'"-' :={xeM'"|||x|| = 1}. (19.10) 

A generic parametric one-unit linear ICA contrast for the model (19.7) is usually 
formulated as follows 

/:r-'^R, /(x):=E[G(xTvv)], (19.11) 

where E[-] denotes the expectation over the observation w and the function G: M — > 
R is a smooth non-linear function, which is chosen according to the specific 
application. Additionally, G is often assumed to be even [12]. We stick to this 
assumption throughout this paper. Simply, for the parallel linear ICA problem 
(19.6), the corresponding parametric contrast function is given as follows, cf. [8], 

m 

F:0{m) ^ K, F{X) := ^E[G(x»]. (19.12) 

/=i 

The performance of corresponding linear ICA methods developed via optimising 
the above contrast functions / and F is significantly dependent on choices of the 
function G and statistical properties of the source signals s. 

Note that in this paper, all computations regarding the one-unit ICA problem, 
i.e., optimising the contrast function/, are performed using coordinate functions of 
W" , which is the embedding space of y""'. The tangent space T^S'"^^ of 5'""' at 
X G 5'""' is given by 

7,-5"-' ==={ceR'"|xT(^==0}. (19.13) 

For the parallel ICA problem (19.12), we consider the orthogonal group 0{m) as 
an embedded submanifold of M"^™. The tangent space TxO{m) of 0{m) at X e 
0{m) is given by 

TxO{m) = {H e M^^^iX^S + E^X = O}. (19.14) 

Therefore, 5''""' and T^S"'^^ are considered as submanifolds of the embedding 
space M'", similarly, 0{m) C R"""" and TxO{m) C R'""""'. For an introduction to 
differential geometry, we refer to [4, 26]. For an introduction to differential 
geometry in relation to optimisation we refer to [1]. 



19.3 Two Interpretations of FastICA 

The original FastICA algorithm was developed as an approximate Newton method 
with a heuristic approximation of Hessians, which optimises the one-unit ICA 
contrast function/ (19.1 1) by using a standard Lagrange multiplier approach [1 1]. 
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Algorithm 1 The FastICA algorithm 

Step 1: Given an initial guess x'"' G 5'""' and set ^ = 0. 
Step 2: Compute x(*+i) = ¥.[G' {x'-''^'^ w)w\ ~ E[G"(x(*)Th,)]x(*). 
Step 3: Update .t(*+» ^ ^. 

Step 4: If ||.t(*+'' — x'*'|| is small enough, stop. 

Otherwise, set A: = A: + 1 and go to Step 2. 

Here, G', G" are the first and second derivatives of the nonlinear function G. 
Each iteration of FastICA can be considered as the map 

W,.c".-i^c-i ,, ,. E[GXxTw)w]-E[G"(xTv.)]x 

"^f^ ^ ' ■" ||E[G'(xTw)w]-E[G"(xTH')]x|r ^^^-^^^ 

In this section, we will investigate two interpretations of FastICA, i.e. the algo- 
rithmic map (/)f (19.15), in the framework of a scalar shift strategy and an 
approximate Newton method. 



19.3.1 Critical Point Analysis of the One-Unit ICA Contrast 

We start with the analysis of the one-unit ICA contrast function/, which plays an 
important role in the subsequent development and analysis. 

First of all, recall the definition of a great circle y^ of S'"^^ at x e y^' as 



y^: M -^ S"'-\ y^,{t) := e\p{t{^x^ - xf))x 




(19.16) 



where ^ G T^S'"^^ and exp() denotes matrix exponentiation. Obviously, y^{0) = x 
and 7^(0) = c^. As a matter of fact, great circles are geodesies on the unit sphere 
with respect to the Riemannian metric induced by the Euclidean metric of the 
embedding space M.'". Now let us compute the first derivative of/ 

D/(x): r^-S""' ^ M, (19.17) 

which assigns to an arbitrary tangent vector ^ G T^S'"^^ the value 

Dnx)^^^(foyJit) 

= E[G'(xTw)^^w]. (19.18) 

Thus, critical points of/ can be characterised as solutions of 

f ■E[G'{x'^w)w]=0 (19.19) 
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for all (^ G TiS'""'. This is further equivalent to saying, that 

E[G\x'^w)w] = he, (19.20) 

with i e M. 

It is worthwhile to notice that the critical point condition for/ depends sig- 
nificantly on both the function G and the statistical features of the sources s. It 
therefore appears to be hardly possible to fully characterise all critical points 
respecting all, possibly unknown, statistical features. Here we at least show that 
any x* e T, which corresponds to a correct separation of a single source, fulfills 
the critical point condition (19.19). 

Let us denote the left-hand side of (19.20) as follows 

k: 5"-' -^ R'", k{x) := ¥.[G' {x'^ w)w]. (19.21) 

For any x* ~ £v, £ Y, one computes 

k{x*) = ¥.[G'{i!.v]Vs)Vs\ 

= yE[G'(£i,>] 

= eyE[G'(i,>], (19.22) 

following the fact that G is even, i.e., G' is odd. Using the mutual statistical 
independence and centering property (zero mean) of sources, the entries of the 
expression E[G'(i/)i] are computed as 

^[U {s,)sj\ - I E[^,(^.)^.] ^ E[G'(.,-)]E[.,.] =0, j^ i, ^'^-^^^ 

i.e., 

nG'{si)s\ = E[G'(.9,0^/]e/, (19.24) 

where e, denotes the i-\h standard basis vector of R™. It is then clear that the 
critical point condition for/ (19.19) holds true at any x* e T, i.e. 

k{x*) = ¥.[G'{si)si]x*. (19.25) 

Note, that there might exist more critical points of the contrast/, which do not 
correspond to a correct separation of sources. 

Now, we compute the Riemannian Hessian of/, i.e. the symmetric bilinear 
form 

nf{x): T^S'"-^ X T^S'"-^ -> K, (19.26) 

given by 

nf{x)i^,0-^,(foyj{t) 

^'^ {E[G"{x'^w)ww'^] -E[G'{x'^w){x'^w)]I„)^. (19.27) 
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Note that y^.(f) is a geodesic through x as defined in (19.16). Let us denote the first 
summand of (19.27) by 

H:S"-^ -^W"'"', H[x):=W.[G"{x^w)ww^]. (19.28) 

For X* e T, we get 

H{x*) = VE.[G"[ESi)ss^]V'^ 

= V¥.[G"{si)ss^]V'^ , (19.29) 

by the fact of G" being even. Again, by applying mutual statistical independence 
and whitening properties of the sources, the (/?, q)-ih entry, for aWp^q — 1 , . . . , m, 
of the expression E[G"(i'/)M^] can be computed as 

(i) if p = q ^ i, then 

E[G"{si)spSg] - E[G"{si)sl] 

= E[G"is,)nsl] 

= E[G"{si)]; (19.30) 

(ii) if p = q = i, then 

E[G"{s!)s„s,,]=E[G"{s,)s^]; (19.31) 

(iii) if p ^ q, without loss of generality, we can assume that q ^ i. Then 

E[G"{s,)s„s,] = E[G"{si)sj,]E[sg] 

= 0. (19.32) 

Thus the expression E[G"(5,)i'i'^] is indeed a diagonal matrix with the i-th 
diagonal entry equal to E[G"{si)sj] and all others being equal to E[G"{si)]. A direct 
calculation leads to 

H{xn = E[G"{si)]I,„ + {E[G"isi)s^] - E[G" (.,■)]) (xV^). (19.33) 

Hence, we compute 

d2 



d,2(/°^..)W 



= ^'//(x*)<f-E[G'(£v>)(£v»]^'^ 

= E[G"(.,-)]C^^ - E[G'iESi){asi)]e^ 

= {E[G"{s>)]-E[G'{s,)ism''^, (19.34) 

i.e., the Hessian ?if{x) off at the critical point x* acts on a tangent vector ^ simply 
by scalar multiplication 

Hf{x*)^ = (E[G" (.,■)] - E[G'(..,-)^,-])^. (19.35) 
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From a statistical point of view, refer to Theorem 1 in [11], it is usually assumed 
that 

Assumption 2 The nonlinear function G: M -^ R is even and chosen such that the 
following inequality holds true for all sources in the parallel linear ICA problem 
(19.3), i.e.. 



E[G"(^,0] -E[G' (.,■>,-] ^0, (19.36) 



for all i — I, 



Thus, the inverse of the Hessian of/ can be ensured to exist at any x* £ Y. We 
then conclude 

Corollary 1.1 Let T C 5'"^', defined as in (19.9), be the set of solutions of the 
one-unit linear ICA problem (19.7) and G: R ^ M a smooth function satisfying 
Assumption 2. Then any ;«:* G T is a non-degenerated critical point of the one-unit 
linear ICA contrast f. 



19.3.2 FastICA as a Scalar Shifted Algorithm 

In this subsection, we will generalise the FastICA algorithm </)f (19.15) in the 
framework of a scalar shift strategy. The scalar shift strategy utilised in FastICA 
accelerates a simpler one-unit linear ICA algorithm to converge locally quadrat- 
ically fast to a correct separation. 

Now, let X* = £v, e T, i.e. x* correctly recovers the /-th source signal si. Then 
following Eq. 19.15, we get 

(f>i{x*) = sign(E[G'(^,-)^,'] - ^[G"{si)])x\ (19.37) 

It is easily seen that, if E[G'(.?;).s,-] - E[G"(i;)] <0, i.e., (f>f{x*) = -x* and 
(/)f(— X*) =jr*, the FastICA algorithm oscillates between neighborhoods of two 
antipodes ±x* e y""' , both of which recover the same single source up to sign. A 
closer look at the FastICA map (/)f (19.15) suggests that the second term, i.e. the 
expression E[G"(x^h')], in the numerator of (f)^ can be treated as a scalar shift, i.e., 
xi-^E[G"(x^w)]x. In other words, the FastICA map <f)f is simply a scalar shifted 
version of the following one-unit ICA algorithmic map proposed in [19] 

0:r-^r-', x^^PI^. (19.38) 

|E[G'(jc' w)w\\\ 

Note that, any x* S T is a fixed point of (/) simply because of Eq. 19.25. By 
replacing the scalar term E[G"(x^w)] in (19.15) by another smooth real-valued 
function p: 5'"'"' -^ R, we construct a more general scalar shifted version of the 
simple ICA map </) as follows 
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,,r-'^y-'. .» ,,g°:;-:-''"]-1tir """ 

||lli[G'(x' w)w\ — p[x)x\\ 
Let us denote the numerator in (19.39) as 

n: S'"-^ -^ R", n{x) := E[G'{x'^w)w] - p{x)x, (19.40) 

and let x* e T, i.e, x* is a correct separation point of the one-unit linear ICA 
problem (19.7). Following the result (19.25), one gets 

n{x*) = i^[G'{si)si\ - p{x*))x*. (19.41) 

Clearly, the map <f)^ will still suffer from the sign flipping phenomenon as the 
FastICA iteration does, when the expression E[G'(.j,)i',] — p{x*) is negative. This 
discontinuity of the map (f)^ can be removed by considering the unique mapping on 
the real projective space MP"'"' induced by (f)^, refer to [23] for more details. In 
this paper, however, we take an alternative approach of introducing a proper sign 
correction term to tackle the discontinuity issue. 
Let us define a scalar function by 

^:5'"-i^R^ li{x) -.^ x^ n{x) 

= ¥.[G' [x^ w)x^ w] - p{x), (19.42) 

which indicates the angle between two consecutive iterates produced by the map </»s. 
We construct the following algorithmic map 

(19.43) 



(19.44) 



a{x*) = (iS(x*))V, (19.45) 

thus </>s(^*) — ■^*- Obviously, to make the map (f>^ (19.43) well defined around x*, 
it requires that P{x*) ^ 0. Thus we just showed 

Corollary 1.2 Let T C 5™"', defined as in (19.9), be the set of solutions of the 
one-unit linear ICA problem (19.7). Let the smooth scalar shift p:S'"^^ — » R 
satisfy the condition 

p(x*)^E[G'{si)si], (19.46) 

for any x* — ev,- £ T. Then each point x* G T is a fixed point the algorithmic 

map (j)^. 



cl),:S"'-^ ->S'"-\ XH- 


fi{x)n{x) 
'||/?(x)7r(x)|r 


and further denote the numerator of (/)5 by 




ff:r-'^M'«, ff(x): 


:=/?(x)7i(x). 


The equation (19.41) easily leads to 
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In the following, we will investigate the conditions on the scalar shift p more 
closely, so that the resulting scalar shifted algorithm converges locally quadrati- 
cally fast to a correct separation point. Later on we will give an example for p to 
show that such a scalar shift actually exists. According to Lemmas 2.7-2.9 in [15], 

we only need to show under which conditions on p the first derivative of (f)^ 

d'4>Ax):T^S'"-^ ^T~ S'"-^ (19.47) 

will vanish at x*. Let \\(j\\ — (j^a, one computes 



D(/.,(x)<^ 



Mx*)f 
fT(x*)crT(x*)' 




II ,*,||2 ,D<tW^U- (19.48) 



=:n(x* 



By Corollary 1.2, we know that 4>^(x*) — x*. Therefore the expression n(->i:*) is 
an orthogonal projection operator onto the complement of span(x*). Consequently, 
the algorithmic mapping (p^ converges locally quadratically fast to x* , if and only 
if the expression D(t{x)^\^^j^, is equal to a scalar multiple of x*. A simple com- 
putation shows 

D<tW^|,^,^.= D/J(x)^[_^^.7i(x*) + l3ix*)Dnix)^l^^,. (19.49) 

Following the fact that DjS(jt)i^| _ , is a scalar and using Eq. 19.41, the first 
summand in (19.49) is equal to a scalar multiple of x*. Now we compute 

D7r(x)c^|^.^^..= DnG'{x'^w)w]il^^-p{x*)^ - Dp(x)cL=,.x*. (19.50) 

Following Bq. 19.33, the first summand in (19.50) is computed as 

Dfc(x)c^|,^,=DE[G'(xTw)w]c^|_.. 
= E[G"{x*^w)ww'^]^ 
^E[G"{si)]^. (19.51) 

Then, due to the fact that the third summand in (19.50) is already a scalar 
multiple of x*, the expression D7i(x)^|^^^.. , as well as D(j{x)^\^^^.,, are both equal 
to scalar multiples of x*, if and only if the following equality holds true 

E[G"{si)]i-p{x*)i^;jc\ (19.52) 

where ?. e M, i.e., p(x*) =E[G"(5,)]. Thus the convergence properties of the 
map 05 can be summarised in the following theorem. 
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Theorem 1.3 Let T C 5'"^', defined as in (19.9), be the set of solutions of the 
one-unit linear ICA problem (19.7). Let the smooth scalar shift p:S"^^ — > M 
satisfy the condition (19.46). Then the algorithmic map (j)^ is locally quadratically 
convergent to x* = £v,- GT if and only if 

pix*)^E[G"{si)]. (19.53) 

Remark 1 Consequently, the scalar shift utilised in FastICA, i.e., 

Pf:5"'-'^M, pf{x):=E[G"{x'^w)] (19.54) 

is actually a simple choice of a scalar shift strategy, satisfying the condition 
(19.53). It therefore accelerates the map (p^ to converge locally quadratically fast 
to a correct separation. 

19.3.3 FastICA as an Approximate Newton Algorithm 

As pointed out before, the original FastICA algorithm was developed by using 
some heuristic approximation of the Hessian. In this subsection, we will develop 
an approximate Newton one-unit linear ICA method by using geometric optimi- 
sation techniques. The approximation of Hessians we employ here is rigorously 
justified using the statistical features of the linear ICA problem. FastICA is shown 
to be just a special case of our proposed method. 

Let 5™^' C M™ be endowed with the Riemannian metric induced from the 
standard scalar product of the embedding R™. The geodesies with respect to this 
metric are precisely the great circles y^ (19.16). Thus, according to Eq. 19.18, the 
Riemannian gradient of / at jc S 5™"' can be computed as 

yf{x) = (/„ - xx'^)E[G'{x'^w)w]. (19.55) 

Here, !„ — xx^ is the orthogonal projection operator onto the tangent space 
T^fS'"^^ . Computing the second derivative of/ along a great circle, a Newton 
direction c £ TxS"^^ can be computed by solving the following linear system 

(/„ - xx'^) {E[G"{x'^w)ww'^] - E[G'{x'^w)x'^w]I„,)i 

= (/,„ -xx'^)E[G'{x'^w)w]. (19.56) 

Thus, a single iteration of a Riemannian Newton method for optimising the con- 
trast / can be easily completed by projecting the above Newton step c back onto 
S'"^^ along a great circle (19.16) uniquely defined by the direction ^ G T^S"^^^ . 

However, there exists a serious drawback of such a standard approach. The 
expression E[G" {x^ w)ww^] at the left-hand side of (19.56) is an m x m dense 
matrix, often expensive to evaluate. In order to avoid computing the true Hessian, 
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one would prefer to have certain approximations of the true Hessian with low 
computational cost. Suggested by the property of the Hessian of/ at critical points 
X* e T being scalar, it is sensible to approximate the expression E[G"(x^h')h'w^] — 
E[G'(x^w)x^w]/„, at arbitrary x G 5"'"' within an open neighborhood [/j. C S'"^^ 
around x* by the following scalar matrix, 

(E[G"(xTw)] - E[G'(xTw)xTh.])/„„ (19.57) 

which obviously gives the correct expression at x* . Thus, an approximate Newton 
direction £,„ G T^S'"^^ for optimising the one-unit ICA contrast/ can be computed 
by solving the following linear system 

(/„, - xx^) {E[G"ix'^w)] - lE,[G'{x'^w)x'^w])^„ 

^ [In, -xxT)E[G'(xTw)h']. (19.58) 

Note that by construction, (E[G"(x^h')] — E[G'(x^w)x^w])Ca lies in the tangent 
space T^S'"^^ . The projection /„, — xx^ on the left-hand side of (19.58) is therefore 
redundant. Thus, the solution of (19.58) in terms of (^„ is simply given by 

_ (/„-xxT)E[G'(xTw)w] 

-" E[G"(xTm')]-E[G'(xTw)xTh'] ■' ■ ^ ' ' 

Although evaluating the great circle (19.16) is not too costly, we prefer to use the 
orthogonal projection instead, which costs slightly less computations than y^ 
(19.16). Here, we specify the following curve on y^' through x e S'"^^ 

,.:(-.,.) ^r-', .^^, (19.60) 

with K > and ^ G T^S"'^^ arbitrary. Obviously, ^i-(O) =x and /(^(O) = ?. A 
similar idea of replacing geodesies (great circles) by certain smooth curves on 
manifolds has already been explored in [3, 10, 24]. By substituting the approxi- 
mate Newton direction S,^ into (19.60), we end up with an approximate Newton 
method for optimising the one-unit ICA contrast/. This leads to 

4^(E[G'(xTw)wl - E[G"(xTw)lx) 
^^ ' ||J^(E[G'(xTw)w]-E[G"(xTw)]x)||' ^ '■' ^ 

where 

tr. S'"-^ -^ R, r]{x) := ¥.[G' {x^ w)x^ w] - ¥.[G" {x^ w)]. (19.62) 

Remark 2 If }-]{x) > Q holds always, it is easy to see that the map </>„ is actually 
the FastICA map (/)f. 

Moreover, the map (/>„ is identical to the scalar shifted ICA map </>,, (19.43) 
when employing the FastICA shift p^ (19.54). Straightforwardly following 
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Theorem 1.3 and Remark 1, the local convergence properties of the algorithmic 
map (/)|^ can be summarised as follows 

Corollary 1.4 Let Y C 5'"^', defined as in (19.9), be the set of solutions of the 
one-unit linear ICA problem (19.7) and G: R — > M a smooth function satisfying the 

condition (19.36). Then the algorithmic map </>„ is locally quadratically conver- 
gent to X* e T. 



19.4 Generalised FastICA for Parallel Linear ICA 

In Sect. 19.3, FastICA is reinterpreted as special cases of a scalar shifted algorithm 
and an approximate Newton method, respectively. A quite natural question is 
whether it is possible to generalise FastICA to solve the parallel linear ICA 
problem, i.e., to extract all sources simultaneously. Although, there exist several 
other parallel linear ICA algorithms, e.g., fixed point Cayley ICA algorithm [8], 
retraction based ICA algorithm [9], gradient based ICA algorithm [13], and geo- 
desic flow based ICA algorithm [18], a thorough discussion about these algorithms 
is certainly out of scope for this paper. We refer to [1, 3, 7, 10, 25] for further 
reading on optimisation on manifolds and comparisons of different algorithms in 
terms of performance and implementation. In this section, we generalise FastICA 
using a matrix shift strategy and an approximate Newton method to solve the 
parallel linear ICA problem. 

19.4.1 Matrix Shifted QR-ICA Algorithm 

First of all, let Z = ZqZr be the unique QR decomposition of an invertible matrix 
Z e jj"'^™ into an orthogonal matrix Zq e 0{m) and an upper triangular matrix 
Zr e ig™><"' with positive diagonal entries. We denote the two factors of the QR 
decomposition by 

Q:W'"'" -. Oim), 

^ ^ '' (19.63) 

and 

R; M^x"' _> {z e «'""" I Xii > 0,Xij = for all / >;}, 



X^X, 



;i9.64) 



R- 



We then construct a simple and direct generalisation of the basic ICA map (19.38) 
to a full matrix as 

^■.0{m) ~^0{m), X^{K(X))f^, (19.65) 
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where 

K: 0{m) -^ R™^'", K{X) := [k{xi), . . .,A:(x«)], (19.66) 

with k defined in (19.21). 

Let X* — [xj, . . .,x*J e be a correct separation point of the parallel linear 
ICA problem, i.e., x* — £,v^(,) G 5'"'"'. Using Eq. 19.25, one gets 

K{X*)=[k{x\),...,k{x*J] 

= X*A, (19.67) 

where 

A = diag(E[G'(^,(i))^,(i)]), . . .,E[G'(^,(„))^,(„)]). (19.68) 

Due to the fact that every diagonal entry of A is positive, one has {K{X*))q — X*, 
i.e., any correct separation point Z* e is a fixed point of the map $ (19.65). 

Now, by generalising the concept of a scalar shift strategy to a matrix shift 
strategy, we construct a matrix shifted version of $ as follows, 

(Ds: 0{m) -^ 0{m), X^ {K{X) -XL{X))q, (19.69) 

where 

L:C(m)^R""^"', X.^diag(E[G"(x»], . . .,E[G"(x,»]). (19.70) 

Here we choose a special shift, namely a diagonal matrix L (19.70). To discuss 
more general (dense) matrix shifts is certainly a challenge. We consider such an 
issue as an open problem for our future research. 

Now each column of the matrix K{X) — XL{X) can be computed individually 
by 

nf.S"'-^ ^ M™, Tifix) := E[G'{x'^w)w] - E[G" {x'^ w)]x, (19.71) 

i.e., the algorithmic map $5 can be restated as 

^,:0{m)^0{m), X^ {[nf{xi), . . .,nf{x,„)%. (19.72) 

Note that each column 7tf is indeed the numerator of the original FastICA map (pf 
(19.15) applied to each x, individually. In other words, the algorithm by iterating 
<t)s is essentially a natural generalisation of FastICA to a full matrix. Unfortunately, 
such a map Os is still not immune from the phenomenon of column-wise sign 
flipping, which is caused by the same reason as for the original FastICA. Thus, by 
using the function t] (19.62) as a column- wise sign correction term, we construct 

^,:0(m)-.0{m), X^ {[(Tf{xi), . . .,(rf{x,„)])Q, (19.73) 

where 

o-ftS™"' ^M'", a{{x) ■~t]{x)7if{x). (19.74) 
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Note that the first column of X G 0{m) under the map Oj evolves exactly as the 
column X G 5'""' under the algorithmic map </)„ (19.61). We refer to the algorithm 
induced by iterating the map Oj as matrix shifted QR-ICA algorithm. This algo- 
rithm is summarised as follows. 

Algorithm 2 Matrix shifted QR-lCA algorithm 

Step 1 Given an initial guess Z''^' — [x\ , . . .,x,„ ] e 0{m) and set ^ = 0. 
Step 2 For / = 1, . . ., m, compute 

x(^+» = [.K4''),...,^f(4f) 

where crf(x) = sign(x^7tf(x))7tf(x), and 

7if (x) = E[G'{x'^w)w] - E[G"{x'^w)]x. 

Step 3 Update X(*+i) ^ (^**+'%- 

Step 4 If IIX^'^+i) -XW|| is small enough, stop. 
Otherwise, set k = k + I and go to Step 2 

Here, || • || is an arbitrary norm of matrices. 

The convergence properties of the algorithmic map $s are then summarised. 

Lemma 1.5 Let & C 0{m), as defined in (19.8), be the set of solutions of the 
parallel linear ICA problem (19.6) and G: M ^ M a smooth function satisfying 
Assumption 2. Then any X* G is a fixed point of the algorithmic map (Ds. 

Proof Let us denote 

K,: Oim) ^ W'^"\ K^X) := [fff(-^i), • • •, ^^f (^m)]- (19.75) 

According to the result (19.25), one computes that 

fff(x*) = {n{x*)fx\ (19.76) 

i.e., 

K^{X*)^X*A, (19.77) 

where 

A = diag((^(xt))^...,(»7«))^). (19.78) 

The resuh follows from the fact that '^^{X*) = X*. D 

To show the local convergence properties of the algorithmic map "l),, we need 
the following lemma. 
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Lemma 1.6 Let Z e M™^'" be invertible and consider the unique QR decompo- 
sition of Z. Then the derivative of (Z)q in direction Y e M'"^™ is given by 

D(Z)Qr= (Z)Q(((Z)Q)Ti'((Z),)-')^,^„, (19.79) 

where Tskew denotes the skew-symmetric part from the unique additive decompo- 
sition of T ^ j^/nxm j^^j^ skew-symmetric and upper triangular part, i.e., 

T — Tskew + T'lippen (19.80) 

where r^kew = -(^skew)^ and {j^^^,) ..^ for all i >j. 

Proof See Lemma 1 in [16] for a proof. D 

Theorem 1.7 Let C 0(m), as defined in (19.8), be the set of solutions of the 
parallel linear ICA problem (19.6) and G: K — > M a smooth function satisfying 
Assumption 2. Then the algorithmic map $., is locally quadratically convergent to 
X* € 0. 

Proof We will show that the linear map 

D$,(X) : TxO{m) ^ T~^^^OH, (19.81) 

atZ* is indeed the zero map. By exploiting the fact that Os(X*) = {K^{X*))q = X* 
and {Ks{X*))^ ^ A, one gets 

D$,(Z)S|;,^^. = D{UX))qE\^^^, 

= (/^,(X*))q(((^,(X*))q)WW5|x=x.)((^s(X*))r)") 

\ / skew 

= X*{r^{DK,{X)E\^^^,)A-')^^^^. (19.82) 

Following Theorem 1.3, i.e., for all i — 1, . . .,m, 

Dfff(x)c|,=,. = 5ix*, (19.83) 

with some (5; G M and denoting A = diag((5i, . . ., 5,„) e R"""", we have 

BK,iX)E\^^^, ^ X*A, (19.84) 

i.e., 

D$.(X)E|,.,.^X*(aA-')^^^^ (19.85) 

= 0. 
Therefore the result follows. D 
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19.4.2 Approximate Newton-Like Parallel ICA Method 

Following the same idea of approximating Hessians as in developing the 
approximate Newton one-unit linear ICA method, in this subsection, we develop 
an approximate Newton type method for minimising the parallel contrast function 
f" (19.12). 

19.4.2.1 Critical Point Analysis of Parallel ICA Contrast 

Let so(m) = JQ e M™^"'|Q = — Q^} be the vector space of skew- symmetric 
matrices and denote Q = [coi, . . .,ffl„,] = (ajy)'". j e 5o(m). By means of the 
matrix exponential map, a local parameterisation fix of 0{m) around X G 0{m) is 
given by 

fix- so(m) ^ 0{m), /(^^(Q) := Xexp (Q). (19.86) 

By using the chain rule, the first derivative of contrast function F (19.12) at 
X e 0{m) in direction H = XQ = [^i, . . ., c,„] e TxO{m), i.e., ^j = XcOj, is com- 
puted by 

DF{X)E^^iFofix){tQ) 



/-I 

m 

J2 (ojx'^kixi) 



/=i 



= E 0Jij[xJk{xi)-k{xjyxij. (19.87) 

1 <i<j<m 

The critical points X e 0(m) of F are therefore characterised by 

m 

Y^ 0Jij(xJk{xi)~k{xj)^xA ^0. (19.88) 

1 < i <j < m 

Analogously to the one-unit linear ICA contrast/ in (19.20), we are able to show 
that any X* e 0, which corresponds to a correct separation of all sources, is a 
critical point of F. Let X* — [xj, . . .,x*J £ 0, i.e., x* = £,Vj(,) G 5''""'. Recalling 
Eq. 19.25, we compute 

x*^k{x*) - kixjY'^x* = 0, (19.89) 

for all (■ y^j. 
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Since the geodesies through X e 0{m) with respect to the Riemannian metric 
{XQ.i,XQ.2) := — trQiQ2 are given by iix{tO), (left translated one parameter 
subgroups), the Riemannian Hessian of F can be computed from 



'HF{X){X^,XQ.) =^{Fo ^ix){tQ.) 



r=0 



"' r ^ 

^V.\G"{xlw){o^Jx''wf - G\xJw){^J»Xw) 

i—\ 
m m 

Y^ 0iJX^¥.[G"{xJw)ww'^]Xmi - ^ 0ijQX^¥.[G'{xJw)w\ 

m 

^ oiJx'^H{xi)Xmi + tr Q^X'^ K{X) . (19.90) 



/=i 



Following Eq. 19.33, the first summand in (19.90) evaluated at X* G is com- 
puted as 



CO: 



!=! 

m 

= ^E[G"(^,(,-))]co>,- 

i=l 

m 

= Y, 4(E[G"(.,(,-))]+E[G"(.,w)]), 

1 < / < /' < m 

and the second summand in (19.90) is evaluated as 

trQ^X*'^K{X*) = trQ^X*Tx*A 

m 

= J2 -4(lE[G'(.,(,-)).,(,-)]+E[G'(,v,y)).,y)]). 
1 < / <y < m 

Hence, 

m 

nF{x*){x*n,x*n)= ^ 4((e[g"(.,(,.))]-e[g' (.,(,.))..,(,.)]) 

1 < I <j < m 

+ (E[G"(.,o'))]-E[G'(^.(/))^.w])) 



1 <i<j <m 



(19.91) 



(19.92) 



(19.93) 



Let Qy e 5o(m) be defined by ojy = -co,,- = 1 and zeros anywhere else. We denote 
the standard basis of so(m) by 
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B= {Qy eso(m) I l<i<j<m}. (19.94) 

It is obvious that T-iF{X*){X*Qij,X*Clpg) = for any distinct pair of basis matrices 
(Qy, Qpq) C B, i.e., the Hessian of F at Z* e is indeed diagonal with respect to 
the standard basis B. Moreover, to ensure that such a Hessian is invertible, it will 
require the expression ii{x*) + t]{x*) to be nonzero. According to specific appli- 
cations, such a condition can be fulfilled by choosing the function G carefully. 
A special example is given and discussed in Remark 3. We conclude our results as 
follows. 

Theorem 1.8 Let C 0{m), as defined in (19.8), be the set of solutions of the 
parallel linear ICA problem (19.6) and let G; R — > M iie a smooth even function 
such that 

E[G" (.,■)] - IE[G'(.,-)^,.] ^ -inCisj)] - nG'{sj)sj]), (19.95) 

for all i, j ~ I, . . .,m and i =/= j. Then any X* £ is a non-degenerated critical 
point of F {19.12). 

Remark 3 As pointed out before, the value of the expression E[G"(5,)] — 
E[G'(i,)i',] depends on both the function G and source signals s. It is hardly 
possible to characterise the set of critical points in full generality. As a special 
case, let us specify G by the following popular choice 

G:E^M, G(x) :=^log(cosh(Cx)), (19.96) 

where i^ £ R is positive. According to the fact that the absolute values of entries of 
whitened sources are bounded by the pre-whitening process, it can be shown that, 
by choosing i^ carefully, the value of E[G"(i,)] —K[G'{si)sj] is ensured to be 
positive for any signal. In other words, every point Z* e is a strict local min- 
imum of the parallel ICA contrast F. 



19.4.2.2 Approximate Newton-Like Parallel ICA Algorithm 

Following the same strategy of developing the approximate Newton one-unit ICA 
algorithm (/)„ (19.61) in Sect. 19.3, we now apply the idea of approximating the 
Hessian of F to develop a Newton-like parallel ICA method. The approximation is 
given as follows 

m 

W(Z)(ZQ,ZQ)« J2 -4('K^/)+'?(^;))- (19.97) 

1 <i<j<m 
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Clearly, this choice may seriously diifer from the true Hessian. However, coin- 
ciding at the critical points X* £ 0, it locally yields a good approximation. 

Thus, by recalling Eq. 19.87, an approximate Newton direction XQ. e TxO{m) 
with Q = {cOij)1'.^ e so(m) is explicitly computed by 

{,ixi) + r]ixj))oj,j =. xjkixj) - kixifxj, (19.98) 

for all 1 <i<j<m. According to the condition (19.95), one computes directly 

'"'^- n{x.) + n{^j) ■ ^ ^ ^ 

The projection of XQ onto 0{m) by using ^x (19.86) completes an iteration of an 
approximate Newton-like parallel ICA method. It is known that the calculation of 
the matrix exponential of an arbitrary Q. £ so(m) requires an expensive iterative 
process [17]. To overcome this computational issue, one can utilise a first order 
approximation of the matrix exponential via a QR decomposition, which preserves 
orthogonality. A similar idea has been explored in [3]. For a given Q S so(m), let 
us compute the unique QR decomposition of the matrix /,„ + Q. Since the deter- 
minant of /,„ + Q is always positive, the QR decomposition is unique with 
det(/m + Q)q ~ 1- We then define a second local parameterisation on 0{m) as 

vx: so(m) ^ 0(m), V;,(Q) := X{I„, + Q)q. (19.100) 

Substituting the approximate Newton step as computed in (19.99) into Vx leads 
to the following algorithmic map on 0{m) 

O„:0(m)^0(m), X^X(/„, + Q(Z))q, (19.101) 

where Q: 0{m) -^ so(m) is the smooth map consisting of the following entry-wise 
maps 

5,:0(m)-R, 5,(X):=^^^^l^^^f4^' (^9.102) 

n{xi) + y]{xj) 

for all 1 < ( <j < m. Iterating the map <!>„ gives an approximate Newton-like 
parallel ICA algorithm as follows. 

Algorithm 3 Approximate Newton-like parallel ICA algorithm 

Step 1 Given an initial guess Z'"' ~ [x\' , . . .,x,„] e 0{m) and set ^ = 0. 

Step 2 Compute Q**^' = (cof )y=i ^ ®°('^) "^^^^ 

ofJJ^f}-^^, fori <i<J<m, 

where C{xi,Xi) =¥.[G'{xJw){xJw)], and 
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nix) = E[G'{x'^w){x'^w)] - E[G"(xTw)]. 

Step 3 Update X(*+i) ^ X(*)(/„ + fi'*%. 

Step 4 If ||X('^+') -X(*)|| is small enough, stop. 

Otherwise, set A: = ^ + 1 and go to Step 2. 

Remark 4 For the one -unit linear ICA problem, the scalar shifted algorithm and the 
approximate Newton method are indeed identical as shown in Sect. 19.3. For the 
parallel linear ICA problem, however, the two resulting algorithms, i.e. matrix shifted 
QR -ICA algorithm (Algorithm 2) and approximate Newton-like parallel ICA algo- 
rithm (Algorithm 3), are different. Experimental results can be found in Sect. 19.5. 

19.4.2.3 Local Convergence Properties 

Finally the local convergence properties of the algorithmic map $„ are charac- 
terised by the following two results. 

Lemma 1.9 Let C 0{m), as defined in (19.8), be the set of solutions of the 
parallel linear ICA problem (19.6) and G: R — » M Z^e a smooth function satisfying 
the condition (19.95). Then any X* £ & is a fixed point of the algorithmic map <[)„. 

Proof Following Eq. 19.95, one knows that the denominator in (19.102) is non- 
zero. Therefore Eq. 19.89 yields 

Wij{X*)=Q, (19.103) 

for all 1 < i<i < m. The result follows from (/m)Q ~ Im- D 

Theorem 1.10 Let & C 0{m), as defined in (19.8), be the set of solutions of the 
parallel linear ICA problem (19.6) and G: R — > M Z^e a smooth function satisfying 
the condition (19.95). Then the algorithmic map <!)„ is locally quadratically con- 
vergent to X* € &. 

Proof Following the argument in [15], Lemmas 2.7-2.9, we will show that the 
first derivative of the algorithmic map On 

D 0„(X): TxOim) ^ T^^^x)0{m) (19.104) 

vanishes at a fixed point X* . 

Recall the results of Lemma 1 in [16], we compute 

DO„(Z*)S = S+X*DQ(X*)H, (19.105) 

where H = X*Q. G Tx'0{m). Thus, to show that the first derivative of On vanishes 
at X* is equivalent to showing 

DQ(X*)S = -Q, (19.106) 
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which indeed is equivalent to 

Dffl,y(X*)S = -(Oy, (19.107) 

for all 1 < i<j < m. We compute the first derivative of Sy (19.102) 



Da)ij{X*)E = — ffly(X* exp(fQ)) 



r=0 






(19.108) 



It is clear that the first summand vanishes following the previous results, see 
(19.89). By using (19.25) and (19.51), we compute 



D[kix*)'x]-xrkix])jE 

= {Dkix^)EyxJ + k{x';)''q - ^Jk{x]) - x*^Dkix*)E 
= E[G"(.9,(,-))]coTz*Tx; +E[G'(^,(,■)).9,(,■)]x*Tx*aJ; 
- E[G'{s,^))s,^f^]ojJX*^x* - E[G"(^,(,.))KX*a,,- 
= E[G"(.9,(,-))]cOy +E[G'(^,(;)).5,(,-)]c«,-,- 

- HG' {s,Q))s,(j-j]mij - E[G" {s,^i^)]coji 
= (E[G"(^,(;))] -E[G'(5,(,-))^,(,.,] +E[G"(^,(,-))] -E[G'(^,y)).9,(,■)])ffly 
=-('?(4)+'7(^;))ffl'y• (19.109) 

Applying now Eq. 19.108 yields Eq. 19.107. The result follows. D 



19.5 Numerical Experiments 

In this section, the performance of all presented methods is demonstrated and 
compared by using several numerical experiments. We apply all algorithms to an 
audio signal separation dataset provided by the Brain Science Institute, RIKEN, 
see http://www.bsp.brain.riken.jp/data. The dataset consists of 200 natural speech 
signals sampled at 4 kHz with 20, 000 samples per signal. 
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We know that the FastICA algorithm only recovers one single source. In 
order to extract all source signals, a deflationary FastICA approach using the 
Gram-Schmidt orthogonalisation has already been developed in [11]. In our 
experiment, to avoid dealing with the sign flipping issue, we implement the map 

(j)^ (19.61) instead of the FastICA map ([>( (19.15). Nevertheless, we still refer to 
this method as FastlCA-Defl. It is important to notice that, for a linear ICA 
problem of extracting m sources, after computing the second last column jf,„_i of 
the demixing matrix X G 0{m), the last column x,„ is already uniquely determined 
up to sign by the Gram-Schmidt process. In other words, one only needs to apply 
m— \ deflationary FastICA procedures to extract m signals. By the same reason, in 

implementing the matrix shifted QR-\CA algorithm <l)s (19.73), one can skip the 
update on the last column, i.e., for each iteration, we do the following 

$f:OH^O(m), ZK^([fff(^i),...,(Tf(x„,_i),x„])Q. (19.110) 

In the sequel, we call it QR-ICA, and refer to the approximate Newton-like parallel 
ICA method On (19.101) as ANPICA. 

All methods developed in this paper involve evaluations of the functions k 
(19.21) and Pf (19.54) at certain columns of the demixing matrix X G 0{m). It is 
then sensible to utilise the number of column-wise evaluations of k and p^ as a 
basic computational unit to compare these methods. Let us denote the sample size 
of each mixture by n, which is usually much greater than the number of signals m. 
In all our experiments, we fix the sample size to « = lO'*. Obviously, the com- 
putational burden of these methods is mainly due to the evaluations of k and pf , 
which depend on a huge data matrix W e M'"^". For a given x £ 5™^', a recon- 
struction of the corresponding estimated signal firstly involves m ■ n scalar mul- 
tiplications and {m — I) ■ n scalar additions. Consequently, computing k{x) and 
P{{x) requires n evaluations of G' and G" on the estimated signal together with 
another m ■ n scalar multiplications and m ■ {n — I) + n scalar additions. Finally, 
the major computational cost for evaluating k{x) and P[{x) can be summarised in 
the following Table 19.1. 

Here the notations (E) and © represent the scalar multiplication and addition, 
respectively. Thus by counting in the factor of the number of columns, the com- 
plexity of all three methods is roughly of the same order, i.e., of 0{m^n). 

The task of our first experiment is to separate m = 3 signals which are ran- 
domly chosen out of 200 source speeches. All three methods are initialised by the 
same point on 0{m). Figure 19.1 demonstrates the results from a single experi- 
ment. It shows that all methods can successfully recover all source signals up to a 



Table 19.1 Major computational costs of evaluating k{x) (19.21) and Pf{x) (19.54). 




Operation ® © C 


G" 


Times 2m ■ n 2m ■ n — m n 


n 
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(c) ANPICA 

Fig. 19.1 A simple example of three audio signals, c-e are estimated signals by FastlCA-Defl, 
QR-ICA and ANPICA 



sign and an arbitrary permutation Fig. 19.1c-e. It is worthwiiile to notice that the 
first extracted signal by FastlCA-Defl and QR-lCA, respectively, are always 
identical, since the first column of QR-ICA evolves exactly as the first column of 
FastlCA-Defl does. 
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Finally, we will compare the performance of all presented methods in terms of 
both separation quality and convergence speed. The separation performance is 
measured by the average signal-to-interference-ratio (SIR) index [21], i.e., 

,2 



10, 



maxz 



E4- 



y 



maxz; 



(19.111) 



i=i 



y 



where Z= (zy)'"-^; =X^V, and V,X (^ 0{m) are the mixing matrix and the 
computed demixing matrix, respectively. In general, the greater the SIR index, the 
better the separation. The convergence speed is measured by comparing 
the elapsed CPU time required by each algorithm to reach the same level of error 
|XW -X*||p<e. Here ||Z(*' -^*||f denotes the Frobenius norm of the difference 
between the terminated demixing matrix X* G 0{m) and the /c-th iterate X'*^' G 
0{m). Since 



|X«-Xl 



Eh"' 



(19.112) 



the column-wise stop criterion for FastlCA-Defl is chosen to be ||x**^ 
e/m. In our experiments, we set e = 10"'". 
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We firstly apply all three algorithms to extract m = 5 randomly chosen speech 
signals. By replicating the experiment 100 times, the boxplots of the SIR index and 
the elapsed CPU time (seconds) required by each algorithm are drawn in 
Fig. 19.2a, b, respectively. Figure 19.2a shows that ANPICA outperforms both 
FastlCA-Defl and QR-ICA in terms of separation quality, and that both FastlCA- 
Defl and QR-lCA perform equally well. In terms of convergence speed shown in 
Fig. 19.2b, it indicates that FastlCA-Defl is the fastest algorithm to reach the pre- 
chosen error level, while QR-ICA takes the longest time to converge. By counting 
the total number of column-wise evaluations for each method until convergence as 
shown in Fig. 19.2c, the average elapsed CPU time for evaluating each column in 
each method are computed in Fig. 19. 2d. It shows that ANPICA requires the most 
amount of column-wise evaluations. 

Finally, we further apply all methods to extract m — 10, then 15, and finally 20 
speech signals. Similar boxplots are produced in Figs. 19.3, 19.4, and 19.5. One 
can see that, in term of separation quality, ANPICA outperforms the other two 
methods consistently and the outperformance becomes more significant when the 
number of singles increase. Both FastlCA-Defl and QR-ICA perform constantly 
equally well. By comparing the convergence speed, there is no surprise that 
FastlCA-Defl is still the fastest as well as the cheapest algorithm among all the 
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three algorithms. We also observe that, when the number of signals is big enough, 
QR-ICA requires more column-wise evaluations than ANPICA does, i.e., QR-ICA 
becomes the slowest algorithm among all three, see Fig. 19.5c. 

To summarise, FastlCA-Defl is the most simple and fastest algorithm to solve a 
linear ICA problem. However, it is not able to extract signals simultaneously. 
Nevertheless ANPICA is the most superior method for solving the parallel linear 
ICA problem in terms of separation quality at any circumstance. 



19.6 Conclusions 

In this paper, we study the FastICA algorithm, a classic method for solving the 
one-unit linear ICA problem. After reinterpreting FastICA in the framework of a 
scalar shift strategy and an approximate Newton method, we generalise FastICA to 
solve the parallel linear ICA problem by using a matrix shift strategy and an 
approximate Newton method. The work presented in this paper also demonstrates 
similarities in terms of analysis and generalisations between the FastICA algo- 
rithms and the RQI method in numerical linear algebra. 

Recently, the present authors have successfully generalised FastICA to solve 
the problem of independent subspace analysis (ISA) [22]. The key ideas are again 
similar to developing the so-called GraBmann-RQI algorithm [2], a generalisation 
of RQI for computing invariant subspaces. 
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Chapter 20 

On Computing Minimal Proper NuUspace 

Bases with Applications in Fault Detection 



Andras Varga 



Abstract We discuss computationally efficient and numerically reliable algo- 
rithms to compute minimal proper nullspace bases of a rational or polynomial 
matrix. The underlying main computational tool is the orthogonal reduction to a 
Kronecker-like form of the system matrix of an equivalent descriptor system 
realization. A new algorithm is proposed to compute a simple minimal proper 
nullspace basis, starting from a non-simple one. Minimal dynamic cover based 
computational techniques are used for this purpose. The discussed methods allow a 
high flexibility in addressing several fault detection related applications. 



20.1 Introduction 

Consider a p x m rational matrix G(?.), where the indeterminate ?. is generally a 
complex variable. If we interpret G(?.) as the transfer-function matrix (TFM) of a 
(generalized) linear time-invariant system, then according to the system type, a is 
the s variable in the Laplace transform in the case of a continuous-time system or X 
is the z variable in the Z-transform in the case of a discrete-time system. This 
interpretation of X is relevant when system stability aspects are considered. 

In this paper we address the following computational problem: For a given 
p X m rational or polynomial matrix G(/t) with normal rank r, determine a 
(p — r) X p rational basis matrix Ni{a) of the left nullspace of G{)) such that 



Ni{X)G{X) - 0. 
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Of special importance are minimal bases having the least achievable McMillan 
degree. Moreover, depending on the underlying application, further properties may 
be desirable, as for example, determining Ni{a) as a polynomial matrix or as a 
proper rational matrix with specified poles. 

The rigorous study of polynomial bases started with the theoretical works of 
Forney [4], and followed by initial algorithmic developments by Kailath [7]. For 
the computation of a minimal polynomial bases of a polynomial matrix G{1) there 
are many algorithms, see [1] and the literature cited therein. Two main classes of 
methods are the resultant methods, which determine the solution by solving 
directly polynomial equations involving appropriate resultant matrices [1], and 
pencil methods, which rely on matrix pencil reduction algorithms [2]. While 
resultant methods can be a real alternative to the unreliable polynomial manipu- 
lation based methods proposed in [7], their application to rational matrices defined 
implicitly via state space system realizations requires, as a supplementary step, 
bringing the system model into a polynomial representation. This involve factoring 
G(l) as G{a) = N(X)M~\X), where N(X) and M(X) are polynomial matrices, and 
applying the method to N(l). The converse operation (e.g., proper rational fac- 
toring of a polynomial matrix) could be also necessary, if the desired basis must be 
a proper rational basis. Such computational detours are generally considered 
highly unreliable for TFMs of large scale systems (which usually arise in a state- 
space form). 

The pencil methods works directly on the state space realization of G(l), and 
are applicable to both polynomial and rational matrices. The main computational 
tool is the reduction of a matrix pencil to a Kronecker-like form using orthogonal 
transformations. The left Kronecker structure provides the complete information to 
compute a polynomial basis via straightforward matrix and polynomial matrix 
manipulations [2]. 

For many applications, proper rational bases are required. Such bases can be 
immediately obtained from polynomial bases. However, to avoid potentially 
unstable polynomial manipulations, it is of interest to compute proper rational 
bases directly, without the unnecessary detour of determining first polynomial 
bases. The theory of proper rational bases has been developed in [13], where the 
main concepts have been also defined. Of special importance are proper bases 
which are simple (see the exact definition in the next section), representing a direct 
generalization of minimal polynomial bases. A first reliable numerical method to 
compute proper rational bases has been proposed by the author in [20]. This 
method belongs to the class of pencil methods and its main advantage is that a 
minimal proper rational basis can be computed by using exclusively orthogonal 
transformations. Note however, that the resulting basis is generally not simple. 

In this paper we extend the algorithm of [20] to compute simple minimal proper 
rational bases. The new algorithm can be seen as a post-processing method by 
determining a simple basis starting from a non-simple one. Minimal dynamic 
covers techniques are used for this purpose. The proposed new algorithm allows to 
perform easily operations with the resulting basis, which are of importance to 
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solve applications as those encountered in fault detection. For example, computing 
linear combinations of basis vectors immediately leads to candidate solutions of 
the fault detection problem with a least order detector. Several applications in 
solving fault detection problems are discussed in a separate section. 



20.2 Nullspace Bases 

Since polynomial bases represent an important tool in defining the corresponding 
concepts for the more general rational bases, we will recall shortly some of the 
main results of [4]. Assume that Ni(l) is a polynomial basis of the left nullspace of 
G(l). Let denote by n„ the ith index {or degree), representing the greatest degree of 
the (th row of Ni(X). Then, the order of Ni{l) is defined as n^ = X]f=f ''" O-^-' ^^^ 
sum of row degrees). A minimal basis is one which has least order among all 
polynomial bases. The indices of a minimal basis are called minimal indices. The 
order of a minimal polynomial basis Ni{}.) is equal to the McMillan degree of Ni{X). 
Some properties of a minimal bases are summarized below [4, 7]: 

Theorem 1 Let Ni(X) be a minimal polynomial basis of the left nullspace of G(X) 
with row indices «,■, / = I, . . .,p — r. Then the following holds: 

1. The row indices are unique up to permutations (i.e., ifNi{l) is another minimal 
basis, then Ni{X) and Ni{X) have the same minimal indices). 

2. The minimal indices are the left Kronecker indices of G(l). 

3. Ni(X) is irreducible, i.e., has full row rank for all A ^C (Ni{a) has no finite or 
infinite zeros). 

4. N/i^i) is row reduced, i.e., the leading row coefficient matrix (formed from the 
coefficients of the highest row degrees) has full row rank. 

lfMi(X) is a non-singular rational matrix, then Ni{X) :— Mi{a)Ni{X) is also a left 
nullspace basis. Frequently the matrices Mi(X) originate from appropriate left 
coprime factorizations of an original basis Ni(X) in the form 

Ni{X)^Mi{X)-'N,{}:), (20.1) 

where the factors M/(l) and Ni{X) can be chosen to satisfy special requirements 
(e.g., have only poles in a certain "good" region of the complex plane). 

The main advantage of minimal polynomial bases is the possibility to easily 
build proper minimal rational bases. These are proper rational bases having the 
least McMillan degree n^. A proper minimal rational basis with arbitrary poles can 
be simply constructed by taking 

Mi(X) = diag f— ^, . . ., ^TTv) , (20.2) 

\m^{X) mp_r{X)J 
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where m,(/i) is a polynomial of degree «,-, and forming Ni{2.) :— Mi{X)Ni{a). The 
resulting basis Ni{X) has the additional property that the order of any minimal state 
space realization of Ni{a) is equal to the sum of orders of the minimal state space 

realizations of the rows of Ni{l). Furthermore, D/ :— lim;,^oc Ni{a) has full row 
rank. Such a proper basis is termed simple [13] and is the natural counterpart of 
minimal polynomial basis introduced in [4]. 

20.3 Computation of Minimal Proper Bases 

For the computation of a rational nullspace basis Ni(l) a pencil method based on a 
state space representation of G(X) has been proposed in [20]. In this section we 
review this algorithm and give some of the properties of the resulting basis. 
Although minimal, it appears that the resulting minimal basis is not simple. An 
approach to obtain simple bases is presented in the next section. 

The p X m rational matrix G(l) can be realized as a descriptor system 



G(A) 



which is an equivalent notation for 



'A-XE 


B' 


C 


D 



(20.3) 



G{?.) = C{lE~Ay^B + D 

We call this realization irreducible if the pair (A — IE, B) is controllable (i.e., 
rank[A — IE B] — n for all 1 C C) and the pair (A — IE, C) is observable 
(i.e., rank[A-^ — XE^ C'^] — n for all X C C) [12], where n is the order of the square 
matrix A. 

The computational method described in [20] exploits the simple fact that Ni{X) 
is a left nullspace basis of G{X) if and only if for a suitable MX/t) 



Y,{X) := [M,(i) N,{1)] 
is a left nullspace basis of the system matrix 



S{a) = 



A -IE B 
C D 



(20.4) 



(20.5) 



Thus, to compute Ni(a) we can determine first a left nullspace basis 7/(1) for 
S{X) and then Ni(l) simply results as 



N,{1) = Yi{X) 







Yi(l) and thus also Ni(?J) can be computed by employing linear pencil reduction 
algorithms based on orthogonal transformations. The main advantage of this 
approach is that the computation of the nullspace can entirely be done by 
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manipulating state space matrices instead of manipulating polynomial models. 
The resulting nullspace is obtained in a descriptor system representation which can 
be immediately used in applications. In what follows we give some details of this 
approach. 

Let Q and Z be orthogonal matrices (for instance, determined by using the 
algorithms of [2, 17]) such that the transformed pencil 5(1) :— QS{X)Z is in the 
Kronecker-like staircase form 



s{\) 



\E, A 



r,l 



XE, 



■r,l 



Ai - \Ei 
Ci 



(20.6) 



where the descriptor pair (A/ — !£/, C/) is observable, £/ is non-singular, and 
A,. — XEi- has full row rank excepting possibly a finite set of values of X (i.e., the 
invariant zeros of S{X)). It follows that we can choose the nullspace Yi{X) of S{X) 
in the form 

Y,{X)=[Q\C,{XEi~A,)-'\l]. (20.7) 

Then the left nullspace of S{X) is Yi{)~.) = Yi{X.)Q and can be obtained easily after 
partitioning suitably Q as 



Q = 



where the row partitioning corresponds to the column partitioning of F/(l) in 
(20.7), while the column partitioning corresponds to the row partitioning of S{Xi) in 
(20.5). We obtain 



Br.l 


Brj] 


B, 


B, 


D, 


D,l 



Yi{\) = 

and the nullspace of G(l) is 

Ni{X) 



A, - XE, 



Bi Bi 



Ci 

Ai - XEi 
Ci 



DiDi 

B^ 
Di 



(20.8) 



(20.9) 



To obtain this representation of the nullspace basis, we performed exclusively 
orthogonal transformations on the system matrices. We can prove that all com- 
puted matrices are exact for a slightly perturbed original system matrix. It follows 
that the algorithm to compute the nullspace basis is numerically backward stable. 
For an irreducible realization (20.3) of G(X), the full column rank subpencil 

A; - lEi 



c, 



defines also the left Kronecker structure of G{X) [12]. In our case, for 



p > m this result can be relaxed asking only for controllability of the realiza- 
tion (20.3). Indeed, it can be easily verified that all unobservable eigenvalues of 
A — XE appear either as invariant zeros or in the right Kronecker structure and thus 
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do not affect the left Kronecker structure of the system pencil S(l). This is not 
anymore true in the case when the realization (20.3) is not controllable. In this case, a 
part of uncontrollable eigenvalues may appear as invariant zeros, while the rest of 
them enters in A/ — lEi, thus affecting the left Kronecker structure. 

It is possible to obtain the subpencil characterizing the left structure in an 
observability staircase form 



A^, 



Ci 



' Ai,i^l 


A.£ £ — XE^^i 




A^.i - A£^,i - 




At-i,i 












^i,i-A£i,i 


_ 






^0.1 



(20.10) 



where A,-,+i gR'''^'''+', with ^£^i =0, are full column rank upper triangular 
matrices, for / = 0, . . .,£. Note that this form is automatically obtained by using 
the pencil reduction algorithms described in [2, 17]. The left (or row) Kronecker 
indices result as follows: there are ^,_i — /^, Kronecker blocks of size / x (( — 1), 
for i — !,...,£+!. The row dimension of Ni{X) (i.e., the number of linearly 
independent basis vectors) is given by the total number of Kronecker indices, thus 

X]/=i (/'/-i ~ /'/) = j^'o- Applying standard linear algebra results, it follows that jIq 
:= p — r. 

We give now some properties of the computed rational basis. 

Theorem 2 If the realization (20.3) of G{a) is controllable, then the rational 
matrix Ni{k) defined in (20.9) is a minimal proper rational basis of the left null- 
space of G{a). 

Proof According to the definition of a minimal proper rational basis [4, 13], its 
McMillan degree is given by the sum of row indices of a minimal polynomial 
basis. The order of the computed basis in (20.9) is 



m 



I]^/ 



We have to show that this order is the same as that of an equivalent minimal 
polynomial basis. 

The controllability of the realization (20.3) ensures that the left Kronecker 
structure of G{X) and of S{}.) are characterized by the same Kronecker indices. 
Instead of the rational basis F/(A) in (20.7), we can directly compute a minimal 
polynomial basis of the form 



Yi{}.) = I N,{1) , 



(20.11) 



'Ai - IE, 
For this purpose, we can exploit the staircase form (20.10). Using the staircase 



where Ni{a) is a minimal polynomial basis for the left nullspace of 
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form (20.10), it is shown in [2] in a dual context that a minimal polynomial basis 
can be computed by selecting fii_ j — /i, polynomial basis vectors of degree i — 1, 
for i ~ 1, ...,£+ I. This basis can be used to construct a minimal rational basis by 
making each row proper with appropriate order denominators (as shown in 
Sect. 20.2). The total order of such a basis is 



f+i 



rii 



= J2^f^!-i -/^'■)('" 1) 



But this is exactly n/, since 



e+i 



e+i 



«/ = XI -"'-1 (' ~ 1) ~ X! t^i^^ ~ ^) 

1=1 1=1 

i I e 

= Yl f^i' ~ Yl f^i^' ~ 1) = X! ^' 



(=1 



!=1 



!=1 



To finish this part of the proof, we need to show additionally that the realization 
(20.9) is minimal. The pair (A/ — lEi, Ci) is observable, by the construction of the 
Kronecker-like form (20.6). To show the pair {Ai — lEi,Bi) is controllable, 
observe that due to the controllability of the pair (A — aE, B), the sub-pencil 
[A — ).E B] has full row rank, and thus the reduced pencil 



Q 



A-XE B 
C D 







Z 
h 



XEr Ar^l — XEr^l Bj-^l 



Ai - XEi 
Ci 



Bi 
Di 



has full row rank as well. It follows that 

rank[A/ — }.EiBi] = iii 

and thus the pair (A/ — XEi,Bi) is controllable. 
Since, we also have that 



rank 



A/ - ).E, B, 
C, D, 



ni+P 



for all ?., it follows that Ni{X) has no finite or infinite zeros. Thus, Di has full row 
rank p — r and the computed basis is column reduced at 1 = oo [13]. D 

In the case, when the realization of (20.3) of 0(1) is not controllable, the 
realization of Ni{l) is not guaranteed to be controllable. The uncontrollable 
eigenvalues of A — !£ enters partly either as invariant zeros (i.e., part of the sub- 
pencil A, — IE,-) or are part of the sub-pencil A/ — XEi. Therefore, in this case, the 
resulting nullspace basis has not the least possible McMillan degree. 
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Additionally the following important result holds: 

Proposition 1 If the realization (20.3) ofG{X) is controllable, then the realization 
of Ni{X) defined in (20.9) is maximally controllable. 

Proof According to a dual formulation of [10], we have to show that for an 
arbitrary output injection matrix K, the pair (A/ + KCi — aEi^ B/ + KDi) remains 
controllable. Consider the transformation matrix 



U 



I 





0] 





/ 


K 








I 



(20.12) 



and compute S{a) :— UQS{a)Z, which, due to the particular form of Q, is still in 
the Kronecker-like staircase form 



Six) 



Ur 


-XEr 


Ar,l - XE 


r,l 







Ai + KCi - 


XEi 







Ci 





(20.13) 



If we form also 



UQ 



Br.l 

B, + KD, 
D, 



we obtain an alternative minimal proper rational basis in the form 

'Ai + KCi-XEiBi+KDi 



Ni{X) 



Ci 



D, 



(20.14) 



We already have proven in Theorem 2 that such a nullspace basis is a minimal 
realization. Thus, the pair (A/ + KCi — aEi,Bi + KDi) is controllable. D 

Even if the above computed rational basis has the least possible McMillan 
degree, and thus is minimal, still in general, this basis is not simple. In the next 
section, we consider a postprocessing approach permitting to obtain a simple basis 
from a non-simple one. 



20.4 Computation of Simple Bases 



The most obvious approach to determine a simple minimal proper rational basis 
has been sketched in Sect. 20.2 and consists in computing first a minimal poly- 
nomial basis Ni{X) and then to determine the rational basis as Ni{X) := Mi{a)Ni{X), 
where M,(/t) has the form (20.2). 



20 On Computing Minimal Proper Nullspace Bases with Applications 441 

We discuss shortly the method to compute a polynomial basis proposed in 
[2]. This method determines first a minimal polynomial basis W{1) for the left 

'A,-lEi' 



nullspace of the sub-pencil 



in (20.6). This computation can be 



Q 

done by fully exploiting the staircase structure (20.10) of this pencil. The 
details for a dual algorithm (for right basis) are presented in [2]. The degrees of 
the resulting left basis vectors are equal to the left Kronecker indices, and this 
information can be simply read out from the staircase structure. As already 
mentioned, there are p — r basis vectors, of which there are /^(,_i — ^(, vectors 
of degree (i — 1). 

The minimal polynomial nullspace basis of G(/t) results as 



NiW = [QW{l)]Q 



Note that W(X) and Ni(l) have the same row degrees. Furthermore, it is shown in 
[2] that the resulting Ni{l) is row reduced. 

The approach to compute a simple proper minimal basis has been sketched in 
Sect. 20.2 and additionally involves to determine M(l) of the form (20.2), where 
OT,(/l) is an arbitrary polynomial of degree n,. The resulting simple proper minimal 
basis is Ni{a.) :— M{X)Ni{A) and has arbitrarily assignable poles. A state-space 
realization of the resulting basis Ni{a) can be simply built by inspection, 
exploiting the simpleness property. This realization is obtained by simply stacking 
p — r minimal realizations of orders rif, i — 1 ,...,/? — r of each row of Ni{l). The 
resulting state matrix has a block diagonal structure. Although simple, this 
approach is not always well suited for applications (e.g., in fault detection) for 
reasons which will become apparent in Sect. 20.7. 

We propose an alternative to this method which is based on minimum cover 
techniques and, as will be shown later, directly supports the design of least order 
fault detectors. Consider the proper minimal left nullspace (20.9) and denote with 
c/ , and di j the ith rows of matrices Q and D/, respectively. 

Theorem 3 For each i = 1, ...,/? — r, let Kj be an output injection matrix such 
that 

Vi{X) := ci,i{XE, ~Ai~ KiC)-^ {B, + KiDi) + d^ (20.15) 

has the least possible McMillan degree. Then, Ni{X) formed from the p — r rows 
v,(l) is a simple proper minimal left nullspace basis. 

Proof According to Proposition 1, the realization (20.9) of Ni(X) is maximally con- 
trollable, i.e., the pair (A/ + KiCi ~ ?.Ei,Bi + KjDi) is controllable for arbitrary Kf. 
Therefore, the maximal order reduction of the McMillan degree of v,(/!,) can be 
achieved by making the pair (A/ + KjCi — aEi, c/ ,) maximally unobservable via an 
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appropriate choice of Kf. For each / = 1 , . . . , p — r, the achievable least McMillan 
degree of v,(l) is the corresponding minimal index n,-, representing, in a dual 
setting, the dimension of the least order controllability subspace of the standard 
pair {Ef^Aj,E^^Cf) containing span(£',^^c^,.). This result is the statement of 

Lemma 6 in [29]. It is easy to check that Vi(l)G(l) — 0, thus Ni{?.) is a left 
annihilator of G(l). Furthermore, the set of vectors {vi(/l), . . ., Vp^r{^)} is linearly 
independent since the realization of Ni{l) has the same full row rank matrix Df as 
that of Ni(a.). It follows that Ni{l) is a proper left nuUspace basis of least 
dimension X]f=i "i' ^i'^h each row v,(/i) of McMillan degree n,-. It follows that 
Ni(?.) is simple. D 

The poles of the nullspace basis can be arbitrarily placed by performing left 
coprime rational factorizations 

Vi{l) = m,.(A)"'v,-(;0 

The basis Ni{l) :— [v\{a), . . ., vj_,.(l)] obtained in this way, can have arbi- 
trarily assigned poles. 

Simple bases are the direct correspondents of polynomial bases, and therefore 
each operation on a polynomial basis has a direct correspondent operation on the 
corresponding simple rational basis. An important operation (with applications in 
fault detection) is building linear combinations of basis vectors up to a certain 
McMillan degree. 

Consider the proper left nullspace basis Ni(l) constructed in (20.9). By 
looking to the details of the resulting staircase form (20.10) of the pair 
(A/ - lEi, Ci), recall that the full column rank matrices A,_i,, G M'''-|''''' have the 
form 



Ai-ij = 



Ri-i.i 
0, 



where /?,_i,, is an upper-triangular invertible matrix of order /i,. The row dimen- 
sion /i,_j — Hi of the zero block of A,_i , gives the number of polynomial vectors of 
degree / — 1 in a minimal polynomial basis [2, Section 4.6] and thus, also the 
number of vectors of McMillan degree i — 1 in a simple basis. It is straightforward 
to show the following result. 

Corollary 1 For a given left nullspace basisNii?.) in the form (20.9), let 
1 < i<p — rbe a given index and lethbe a (p — r)- dimensional row vector having 
only the lasticomponents non-zero. Then, a linear combination of the simple basis 
vectors not exceeding McMillan degree n,- can be generated as 

v(i) -.^ hCi{lEi - Ai ~ KCiy\Bi + KD,) + hDi (20.16) 

where K is an output injection matrix such that v(i) has the least possible 
McMillan degree. 
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This result shows that the determination of a linear combination of vectors of a 
simple basis up to a given order m, is possible directly from a proper basis 
determined in the form (20.9). As it will be shown in the next section, the matrix 
K together with a minimal realization of v(/,) can be computed efficiently using 
minimal dynamic cover techniques. The same approach can be applied repeatedly 
to determine the basis vectors v,(/l), / = 1, . . .,p — r, of a simple basis by using the 



particular choices h 
matrix. 



e, , where e, is the ith column of the (p — r)th order identity 



20.5 Minimal Dynamic Cover Tecliniques 

Let Ni{a) be the {p — r) x p minimal proper left nullspace basis of G(l) con- 
structed in (20.9). In this section we will address the following computational 
problem encountered when computing simple proper bases or when computing 
linear combination of basis vectors with least McMillan degree: given a row vector 
h, determine the output injection matrix K such that the vector v(l) in (20.16) has 
least McMillan degree. As already mentioned, minimal dynamic cover techniques 
can be employed to perform this computation. 

Computational procedures of minimal dynamic covers are presented in [22] 
(see also Appendix A). The general idea of the cover algorithms is to perform a 
similarity transformation on the system matrices in (20.9) to bring them in a 
special form which allows to cancel the maximum number of unobservable 
eigenvalues. In a dual setting, for the so-called Type I dynamic covers [8], two 
nonsingular transformation matrices L and V result such that 



hNiiX) 



L{Ai - XEi)V 



CiV 
hCiV 



LB, 



Di 
hDi 



All - XEii Ai2 

^21 ^22 



XEi2 Bi 
XE22 B2 



Cii 




C12 
C22 



Di 
hDi 
(20.17) 



where the pairs (An — l£ii, Cn) and (A22 — /i.^22: C22) are observable, and the 
submatrices Cn and A21 have the particular structure 



A21 



A21 

c„ 



with Cii having full column rank. By taking 

K=V 5 
K 
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with K satisfying KC\\ +A21 = 0, we annihilate A21, and thus make the pair 
{Ai + KCi — AEi,hCi) maximally unobservable by making all eigenvalues of 
All — ^-Eii unobservable. The resulting vector v(l) of least McMillan degree, 
obtained by deleting the unobservable part, has the minimal state space realization 



v(A) 



A22 + ^Ci2 - 


- XE22 


B2+KD1' 


C22 




hDi 



(20.18) 



This is also the typical form of achieved realizations for the basis vectors (20.15) 
of a simple basis. To obtain the above realization, the computation of the trans- 
formation matrices L and V is not necessary, provided all transformations which 
are performed during the reductions in the minimal cover algorithm are applied to 
the input matrix Bi as well. In Appendix A we present a detailed algorithm for the 
computation of Type I dynamic covers. 



20.6 Computation of Proper Coprime Factorizations 

We present a straightforward application of minimal proper nullspaces in deter- 
mining proper fractional factorizations of improper rational matrices. This com- 
putation is often a preliminary preprocessing step when designing residual 
generator filters for solving the optimal fault detection problem involving improper 
systems [25]. Let G{1) be a given p x m improper rational matrix for which we 
want to determine a fractional representation in the form 



G(;.) =M-\a)N{a), 



(20.19) 



where both M{a) and N{X) are proper. In applications, the stability of the factors is 
frequently imposed as an additional requirement. For this computation, state space 
techniques have been proposed in [18], based on stabilization and pole assignment 
methods for descriptor systems. We show, that alternatively a conceptually simple 
and numerically reliable approach can be used to obtain the above factorization. 
The relation (20.19) can be rewritten as 



[MWN{1)] 



G{a) 



= 0. 



It follows that the p x (p + m) rational matrix [M(l) N{?.)] can be determined as a 
minimal proper left nullspace basis of the full column rank matrix 



G,(l) = 



'GW 
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The invertibility of M(l) is guaranteed by Lemma 2 of [28] by observing that 
[M{?.) N{1)], as a nullspace basis, has full row rank. 

Using the state-space realizations based algorithm described in Sect. 20.3, we 
obtain the left nullspace basis [M{1) N{X}] of Ge{X) in the form (20.9) with the 
matrices fi/ and D/ partitioned accordingly 



'M{X) N{xy 



Ai - XEi 
Ci 



Bm,i Bn^i 
Dm,i Dn I 



(20.20) 



Since Ej is invertible, the resulting factors are proper. An important aspect of this 
simple approach is that the state space realizations (20.20) of the factors of the 
proper factorization (20.19) have been obtained using exclusively orthogonal 
transformations to reduce the system matrix of Ge(l) to a Kronecker-like form as 
that in (20.6). This contrasts with the algorithms of [18] which involve also some 
non-orthogonal manipulations. The stability of the resulting factors can be 
enforced, using instead (20.9), a representation of the form (20.14) for the left 
nullspace. Here, K is determined to fulfill the stability requirements. 



20.7 Operations Involving Nullspace Bases 

Assume that besides the p x m rational matrix G(/t), we have given also a 
p X q rational matrix F{?.), and the compound matrix [G{1) F{X)] has the state 
space realization 



[G{X)F{\)] = 



A-XE 



C 



B Bf 



D D 



(20.21) 



Observe that the realizations of G(A) and F(a) share the same state, descriptor and 
output matrices A, E, and C respectively. Let NiiX) be a proper left nullspace basis 
of G(l) which can be non-simple in the form in (20.9) or a simple basis formed 
with vectors of the form (20.15). In several applications, besides the computation 
of the nullspace basis, operations with the basis matrix are necessary. For example, 
the left multiplications N,{X)F{a.) or Ni{l)F{X), where Ni{X) = Mf\l)Ni{l) is a 
left coprime factorization, are often necessary in fault detection applications. 
Important are also operations involving a linear combination of the basis vectors, 
i.e., the computation of v(?.)F(?.), where v(l) has the form (20.16) or is in a 
minimal form (20.18) as resulted from the application of the minimal cover 
algorithm. This last operation is also important when computing Ni{a)F(1) with 
Ni{l) a simple proper left nullspace basis formed from row vectors of the form 
(20.15). 

The determination of state space realizations of products like Ni{X)F(X), 
Ni{X)F{X) or v{X)F{X) can be done by computing minimal realizations of the state 
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space realizations of these rational matrix products. The computation of a minimal 
realization relies on numerically stable algorithms for standard or descriptor sys- 
tems as those proposed in [14, 15]. However, these algorithms depend on inter- 
mediary rank determinations and thus can produce results which critically depend 
on the choice of threshold values used to detect zero elements. Since it is always 
questionable that the resulting order is the correct one, this computational approach 
can be categorized as a difficult numerical computation. Alternative ways relying 
on balancing related model reduction are primarily intended for standard stable 
systems. The application of this approach in the case when f (A) is unstable or not 
proper leads to other types of numerical difficulties. For example, by assuming that 
the unstable/improper part cancels completely out, a preliminary spectral splitting 
of eigenvalues must be performed first, which is often associated with unnecessary 
accuracy losses. For polynomial nullspace bases the only alternative to the above 
approach is to manipulate polynomial matrices. However, as already mentioned, in 
some applications this leads to unavoidable detours (state-space to polynomial 
model conversions) which involve delicate rank decisions as well. 

In what follows, we show that all these numerical difficulties to evaluate the 
above products can be completely avoided and explicit state space realizations for 
these products can be obtained as a natural byproduct of the nullspace computation 
procedure. An important aspect of the developed explicit realizations is that both 
Ni{l) and Ni{a)F(1) share the same state, descriptor and output matrices. Thus, the 
developed formulas are also useful for performing nullspace updating and two 
important applications of this techniques in the context of fault detection are 
presented in the next section. 



20.7.1 Left Multiplication with a Non-simple Basis 

Let Ni{a) be a proper left nullspace basis of G(?.) computed in the form (20.9) and 
let Yi(?.) be the left nullspace basis of S(?.) in (20.4). It is easy to show that 



YiW 



A-\E 
C 



Bf 



I Ni{\)F{\) 



and thus 



N,{X)F{X) = Y,{}.) 



Bf 
Df 



= Yiim 



Bf 
Df 



where Q is the orthogonal transformation matrix used in computing the Kronecker- 
like form (20.6) and Yi{X) is defined in (20.7). We compute now 
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Q 



'Bf' 


= 


Bf 
-Df. 



(20.22) 



where the row partitioning of the right hand side corresponds to the column 
partitioning of Yi{a) in (20.7). The realization of Af/(A)f (1) results as 



NiiX)F{X) 



A, - XEi 



Ci 



Bf 



D 



(20.23) 



Note that to compute this realization, only orthogonal transformations have been 
employed. 

In assessing the properties of the resulting realization (20.23), two aspects are 
relevant. The realizations of Ni{a) and Ni{a)F(X) are observable since they share 
the same A/, E/ and Q matrices. However, the realization in (20.23) may not be 
minimal, because its controllability also depends on the involved Bf and Df. The 
second aspect concerns the minimality of the proper left nullspace basis Ni(l) 
itself. When computing Ni{l), we can freely assume that that the overall realization 
of [G{A) F{a)] is irreducible. According to Proposition 2, to obtain a minimal 
proper basis for the left nullspace of G(l) using the proposed rational nullspace 
procedure, the corresponding realization in (20.21) must be controllable. Although 
this condition is usually fulfilled in fault detection applications (see Sect. 20.8.1), 
still the realization of G(l) can be in general uncontrollable, and therefore the 
resulting left nullspace basis Ni(X) may not have the least possible McMillan 
degree. This can be also the case for the resulting realization (20.23) of Ni{X)F{X). 
These two aspects are the reasons why the order of the resulting realization of 
Ni(X)F(X) in (20.23) may exceed the least possible one (which can be obtained by 
working exclusively with minimal realizations and employing the already men- 
tioned minimal realization techniques). 



20.7.2 Left Coprime Factorization 

Assume Ni{).) be the left nullspace basis in (20.9). In several applications, this 
rational basis must be stable, that is, to have in a continuous-time setting only poles 
with negative real parts, or in a discrete-time setting poles inside the unit circle of 
the complex plane. As already mentioned, instead Ni{}.) we can freely use as left 
nullspace basis Ni{a), the denominator factor of the left fractional representation 



Ni{l)=Mr\l)Ni{X), 



(20.24) 
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where M/(A) and Ni{X) are rational matrices with poles in appropriate stability 
domains. A state space realization of [^/(/l) M/(A)] is given by well known for- 
mulas [32] 



[Ni{X)Mi{X)] = 



\Ai + KCi- 


-\Ei 


Bi +KDi K' 


[ Ci 




Di l\ 



(20.25) 



where ^is an appropriate output injection matrix which assigns the eigenvalues of 
Ai + KCi — lEi in desired positions or in a suitable stability domain. Recall that 
this is always possible, since the pair (A/ — /!,£/, C/) is observable. Numerically 
reliable algorithms to determine a suitable K can be used based on pole assignment 
or stabilization techniques [16]. Alternatively, recursive factorization techniques 
as those proposed in [18] can be employed. 
With U of the form (20.12), we can compute 



UQ 



Bf 



D, 



Bf + KDf 
Df 



and in a completely similar way as in the previous subsection, we can obtain the 
realization of Ni{l)F{l) as 



Ni{X)F{X) 



Ai + KCi - XEi 



Ci 



Bf+KDf 



D, 



When employing the algorithms in [18], a supplementary orthogonal similarity 
transformation is also implicitly applied to the resulting system matrices, such that 
the resulting pencil A/ + KCi — aEi is in a quasi-triangular (generalized real Schur) 
form. This computation can be seamlessly integrated into the evaluation of 
Ni{X)F{X) if we perform the left coprime factorization algorithm directly to the 
compound matrix realization 



[Ni{X)Ni{X)F{X)] = 



Ai - XE, 



Ci 



Bi Bf 



DiDf 



In the case of a simple basis, this technique can be employed by considering 
fractional representations of the form (20.24) with Mi{}.) diagonal. In this case the 
same algorithm can be applied to each row of [Ni{X) Ni{a)F{X)], by exploiting the 
block-diagonal structure of the underlying A/ — lEi to increase the efficiency of 
computations. 
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We show first how to compute v{?.)F(?.), where v(l) is given in (20.16). The same 
formula applies for a vector of the form (20.15), with obvious replacements. By 
observing that v{l) = hNi{X) with Ni{X) having the form (20.25), it follows 
immediately 



v{\)F{\) 



In the case when K has been obtained from the cover algorithm, the minimal 
realization of v{X) (after eliminating the unobservable part) is given in (20.18). The 
corresponding realization of v{X)F{a) is 



'Ai + KCi-XEi 


Bf + KDf~ 


hCi 


hDf 



viX)FiX) = 



A22 + KC12 - XE: 



22 



C22 



B 



fa 



KD, 



hDf 



where we used 



LB, 



Bf,i 

Bf,2 



with the transformation matrix L employed in (20.17) and the row partition cor- 
responding to that of the input matrix LB/ in (20.17). Note that the explicit 
computation of transformation matrix L is not necessary, because the minimal 
realization of the product v(a.)F(1) can be directly obtained by applying the per- 
formed transformations in the minimal cover algorithm (see Appendix A) to the 
input matrices of the compound realization 



Ni{X) Ni{X)F{X) 
hNi{X) hNi{X)F{X) 



A, - XE, 



Ci 
hCi 



Bi B 



J 



Di Dj 
hDi hDf 



To compute the products v,(l)F(/'.), for / = 1, . ..,/? — r, the same approach can be 
used taking into account the particular form of h — ej. 

If Ni{l) is a simple basis formed from row vectors of the form (20.16), then the 
resulting state space realization for Ni{X)F{X) is obtained by stacking the realiza- 
tions of Vi{A)F{A) for /= 1, . ..,/? — r. Also in this case, Ni{X) and Ni{X)F{X) will 
share the same state, descriptor and output matrices (i.e.. A/, £/, C/), and the pole 
pencil Ai — lEi will have a block diagonal form, where the dimensions of the 
diagonal blocks are the minimal indices n,-. 
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20.8 Applications to Fault Detection 

We consider the linear time-invariant system described by input-output relations 
of the form 

y(;.) - G„(a)u(1) + Gd{?.)d{l) + G/(A)f(;.), (20.26) 

where y(/l), u{?.), d(l), and t{?.) are Laplace- or Z-transformed vectors of the 
/7-dimensional system output vector y{t), OT„-dimensional control input vector u{t), 
^^(-dimensional disturbance vector d(t), and m^dimensional fault signal vector /(r), 
respectively, and where G„(l), G^(/l) and Gj(l) are the TFMs from the control 
inputs to outputs, disturbances to outputs, and fault signals to outputs, respectively. 
In what follows we will address three applications of the techniques developed 
in the previous sections. 



20.8.1 Solving Fault Detection Problems with Least Order 
Detectors 

The following is the standard formulation of the Fault Detection Problem (FDP): 
Determine a proper and stable linear residual generator (or fault detector) having 
the general form 



r{?.)=R{l) 



u(l) 



(20.27) 



such that: (i) r(t) — when f(t) — for all u(t) and d{ty, and (ii) r{t) 7^ when 
fi{t) ^ 0, for i — 1, . . ., nif. Besides the above requirements it is often required for 
practical use that the TFM of the detector R(a) has the least possible McMillan 
degree. Note that as fault detector, we can always choose R{a) as a rational row 
vector. 

The requirements (i) and (ii) can be easily transcribed into equivalent algebraic 
conditions. The (decoupling) condition (i) is equivalent to 

R{1)G{1) = 0, (20.28) 

where 



G{a) 



Gu{l) Gdil) 

In,.. 



(20.29) 



while the (detectability) condition (ii) is equivalent to 

Rf.{)')^0, i=l,...,mf, (20.30) 
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where Rj^ {X) is the ith column of 



Rf{l) := R{1) 



Gf{A) 





(20.31) 



Let GfX^^) be the iih column of G/(/l). A necessary and sufficient condition for the 
existence of a solution is the following one [3, 11]: 

Theorem 4 For the system (20.26) the FDP is solvable if and only if 

rank[Grf(A) GfX>.)] > rankGrf(i), /=!,.. .,«/ (20.32) 

From (20.28) it appears that R(a) is a left annihilator of G(l), thus one possibility 
to determine R{a) is to compute first a left minimal basis Ni{a) for the left null- 
space of G{a), and then to build a stable scalar output detector as 

R{X)^h{X)Ni{)), (20.33) 

representing a linear combination of the rows of Ni{ a), such that conditions (20.30) 
are fulfilled. The above expression represents a parametrization of all possible 
scalar output fault detectors and is the basis of the so-called nullspace methods. 

The first nullspace method to design residual generators for fault detection has 
been formally introduced in [5], where a polynomial basis based approach was 
used. This approach has been later extended to rational bases in [20, 24]. The main 
advantage of the nullspace approach is that the least order design aspect is naturally 
present in the formulation of the method. In a recent survey [26], it was shown that 
the nullspace method also provides a unifying design paradigm for most of existing 
approaches, which can be interpreted as special cases of this method. 

Consider a descriptor state space realization of (20.26) 



EAx{t) = Ax{t) + B,u{t) + Bdd{t) + Bff{t) 
y{t) = Cx{t) + D,u{t) + Ddd{t) + Dy/(f), 



(20.34) 



where Xx{t) — x{t) or }jc{t) — x{t + 1) depending on the type of the system, 
continuous or discrete, respectively. For convenience, in what follows we assume 
the pair (A — XE, C) is observable and the pair (A — l£, [5„ Bd\) is controllable. 
This latter condition is typically fulfilled when considering actuator and sensor 
faults. In this case, Bf has partly the same columns as B„ (in the case of actuator 
faults) or zero columns (in the case of sensor faults). 
G{X) defined in (20.29) has the irreducible realization 



G(A) 



A-\E 



C 




B,, B, 



Du Dd 

Irr,.. 
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Using the method described in Sect. 20.3, we compute first a minimal proper 
left nullspace basis N/i?.) of G{a). The state space realization of the (p — r) x 
(p + niu) TFM Ni{X) is given by (20.9), where r is the rank of GJ^a). 

To check the existence conditions of Theorem 4, we use (20.23) to compute 



where 



Nf{\):=Ni{\) 



Q 



GfW 




A, - XE, 



Ci 



'Bf' 




* 


Df 


— 


Bf 


_ 




.Df. 



B 



Df 



(20.35) 



Since the pair (A/ — ?.Ei, Ci) 
equivalent to verify that 



is observable, checking the condition (20.30) is 



Df. 



7^0, /=!, 



,m/, 



where Bf. and Df denote the /th columns of Bj and Df, respectively. 

To address the determination of least order scalar output detectors, we can 
compute linear combinations of the basis vectors of a simple proper basis of 
increasing McMillan degrees and check the detectability condition (20.30) for the 
resulting vectors (seen as candidate detectors). According to Corollary 1, this 
comes down to choose an appropriate h and obtain the corresponding K such that 
the row vector v(l) = hNi{A.) in (20.16) has the least possible McMillan order. 
Note that in general, with a randomly generated h, one achieves a detector whose 
order is £, the maximum degree of a minimal polynomial basis. Recall that £ is the 
number of nonzero subdiagonal blocks in the Kronecker-like form (20.10) and 
represents the observability index of the observable pair (A/ — lEi, Ci). In the case 
when no disturbance inputs are present, this is a well know result in designing 
functional observers [9]. 

Lower orders detectors can be obtained using particular choices of the row 
vector h. Using Corollary 1, by choosing h with only the trailing / components 
nonzero, the corresponding linear combination of / basis vectors has McMillan 
degree «,. A systematic search can be performed by generating successive can- 
didates for h with increasing number of nonzero elements and checking for the 
resulting residual generator the conditions (20.30). The resulting detectors have 
non-decreasing orders and thus the first detector satisfying these conditions rep- 
resents a satisfactory least order design. To speed up the selection, the choice of 
the nonzero components of h can be done such that for a given tentative order «, a 
combination of all jIq ~ /i, intervening vectors of order less than or equal to n, is 
built. In this way, repeated checks for the same order are avoided and the search is 
terminated in at most I steps. 
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For the final design, the resulting dynamics of the detector can be arbitrarily 
assigned by choosing the detector in the form 

R{X) = m{X)hNi{l), 

where m(l) is an appropriate scalar transfer function. Note that the resulting least 
order at previous step is preserved provided m{X) is computed using coprime 
factorization techniques [18]. 



20.8.2 Solving Fault Isolation Problems 



The more advanced functionality of fault isolation (i.e., exact location of faults) 
can be often achieved by designing a bank of fault detectors [6] or by direct design 
of fault isolation filters [21]. Designing detectors which are sensitive to some faults 
and insensitive to others can be reformulated as a standard FDP, by formally 
redefining the faults to be rejected in the residual as Active disturbances. 

Let R(X) be a given detector and let Rf(A.) be the corresponding fault-to-residual 
TFM in (20.31). We define the fault signature matrix S, with the (i,j) entry Sjj 
given by 

Sjj =1, if the (ij) entry of Rf{X) is nonzero; 
Sjj =0, if the {i,j) entry of Rf{X) is zero. 

If Sij = 1, then we say that the faulty is detected in residual / and if 5^ — 0, then 
the fault i is decoupled (not detected) in residual ;. 

The following fault detection and isolation problem (FDIP) can be now for- 
mulated: Given &q y. m^ fault signature matrix S determine a bank of q stable and 
proper scalar output residual generator filters 



r,-(l) =/?'■« 



u(l) 



!,■ 



(20.36) 



such that, for all u{t) and d{t) we have: 



(i) r,{t) = when,/;(0 = 0, Vy with 5y ^ 0; 
(ii) r,{t) + when^(0 7^ 0, V; with % + 0. 

In this formulation of the FDIP, each scalar output detector R!()) achieves the 
fault signature specified by the /th row of the desired fault signature matrix S. The 
resulting global detector corresponding to this S can be assembled as 

^R\})' 



R{a) = 



R"{1) 



(20.37) 
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Let 5 be a given q x m^^ fault signature matrix and denote by GAX) the matrix 
formed from the columns of Gf{l) whose column indices j correspond to zero 
elements in row ; of S. The solvability conditions of the FDIP build up from the 
solvability of q individual FDPs. 

Theorem 5 For the system (20.26) the FDIP with the given fault signature matrix 
S is solvable if and only if for each i — I, . . .,q, we have 

rank[Gd(i) G)(a) G/,(i)] > rank[Grf(l) GJ.(l)] (20.38) 

for all j such that Sij ^ 0. 

The standard approach to determine /?(/!,) is to design for each row ; of the fault 
signature matrix 5, a detector B!{X) which generates the /th residual signal r,(r), and 
thus represents the ;th row of R()). For this purpose, the nullspace method of the 
previous subsection can be applied with G(A) in (20.29) replaced by 



G(l) 



G„(;.) Gi{)) G]{X) 

Inr.. 



and with a redefined fault to output TFM G'AX), formed from the columns of G/(/!,) 
whose indices j correspond to 5y ^ 0. The McMillan degree of the global detector 
(20.37) is bounded by the sum of the McMillan degrees of the component 
detectors. Note that this upper bound can be effectively achieved, for example, by 
choosing mutually different poles for the individual detectors. It is to be expected 
that lower orders result when the scalar detectors share their poles. 

Using the least order design techniques described in this paper, for each row of 
S we can design a scalar output detector of least McMillan degree. However, even 
if each detector has the least possible order, there is generally no guarantee that the 
resulting order of R{}.) is also the least possible one. To the best of our knowledge, 
the determination of a detector of least global McMillan degree for a given 
specification S is still an open problem. A solution to this problem has been 
recently suggested in [24] and involves a post processing step as follows. 

Assume that the resulting least order scalar detector R'{X) has McMillan 
degree v,, for / = 1, . . ., 17. We can easily ensure that for v, < Vj, the poles of R'{X) 
are among the poles of R\a). The resulting global detector R{a) according to 
(20.37) has a McMillan degree which is conjectured in [24] to be the least 
possible one. 

We describe now an improved approach in two steps to design a bank of 
detectors, which for larger values of q, is potentially more efficient than the above 
standard approach. In a first step, we can reduce the complexity of the original 
problem by decoupling the influences of disturbances and control inputs on the 
residuals. In a second stage, a residual generation filter is determined for a system 
without control and disturbance inputs which achieves the desired fault signature. 
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Let Ni{}<) be a minimal left nullspace basis for G(A) defined in (20.29) and 
define a new system without control and disturbance inputs as 



y(i):=^/(l)fW, 



where 



Nf{X) := N,{X) 



GfW 




(20.39) 



(20.40) 



The system (20.39) has generally a reduced McMillan degree and also a reduced 
number of outputs p — r, where r is the normal rank of Gj{X). The state space 
realization of the resulting Nf(A) is given in (20.35). Observe that Ni{l) and Nf(l) 
share the same state, descriptor and output matrices in their realizations. 

For the reduced system (20.39) with TFM Nj(a) we can determine, using the 
standard approach, a bank of q scalar output least order detectors of the form 



ri{l) = R'{A)y{X), i^l,...,q 



(20.41) 



such that the same conditions are fulfilled as for the original FDIP. The TFM of the 
final detector can be assembled as 



RW 



R'W 



Comparing (20.42) and (20.37) we have 



N,U) 



R-i?.)^R'{?.)N,iX), 



(20.42) 



(20.43) 



which can be also interpreted as an updating formula of a preliminary (incomplete) 
design. The resulting order of the (th detector is the same as before, but this two 
steps approach has the advantage that the nullspace computation and the associ- 
ated least order design involve systems of reduced orders (in the sizes of state, 
input and output vectors). The realization of R'(X) can be obtained using the 
explicit formulas derived in Sect. 20.7.1. 

The improved approach relies on the detector updating techniques which can be 
easily performed using the explicit realizations of the underlying products. This 
can be seen as a major advantage of rational nullspace based methods in contrast to 
polynomial nullspace based computations. 

The above procedure has been used for the example studied in [31, Table 2], 
where a 18 x 9 fault signature matrix S served as specification. The underlying 
system has order 4. Each line of S can be realized by a detector of order 1 or 2 with 
eigenvalues {— l}or{— 1, — 2}. The sum of orders of the resulting individual 
detectors is 32, but the resulting global detector R{a) has McMillan degree 6. 
Recall that the "least order" detector computed in [31] has order 14. 
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20.8.3 The Computation of Achievable Fault Signature 

An aspect apparently not addressed until recently in the literature is the generation 
of the achievable complete fault signature specification for a FDIP. Traditionally 
this aspect is addressed by trying to design a bank of detectors to achieve a desired 
specification matrix S. The specification is achievable if the design was successful. 
However, it is possible to generate systematically all possible specifications using 
an exhaustive search. For this purpose, a recursive procedure can be devised which 
has as inputs the p x m and p x nif TFMs G(l) and F(a) and as output the 
corresponding signature matrix S. If we denote this procedure as FDISPEC(G,F), 
then the fault signature matrix for the system (20.26) can be computed as 



S = FDISPEC 



Procedure S = FDISPEC(G,F) 



G,i Gd 



Gf 



1. Compute a left nullspace basis Ni{X) of G(l); exit with empty S if A^/(l) is 
empty. 

2. Compute A^/(l) = Ni{a)F{X). 

3. Compute the signature matrix S of Nf{/.); exit if 5 is a row vector. 

4. For i = 1, . . .,m/ 

4.1 Form G,(/l) as column ( of A'/(l). 

4.2 Form Fi{l) from the columns 1, ...,;' — 1, ' + 1, . . ., m/ of Nf{X). 

4.3 Call 5 == FDISPEC(G,, F,). 

4.4 Partition S ~ [S\ ^2] such that Si has i — 1 columns. 

— ~ ~ \ s' 

4.5 Define S = [Si S2] and update S ^- -^ 

As it can be observed, the efficient implementation of this procedure heavily 
benefits of the state space updating techniques developed in Sect. 20.7.1. This 
confers an increased efficiency during the recursive calls, because the dimensions 
of the systems are decreasing during a full recursion. The current recursion is 
broken each time an empty nullspace is encountered or when the last possible 
recursion level has been attained (i.e., S at Step 3 is a row vector). The compu- 
tation of structural information at Step 3 involves checking for zero columns in the 
input and feedthrough matrices Bj and Df of the realization of Nj{X) in (20.23). 
Note that the whole recursive computations can be performed by using exclusively 
orthogonal transformations. 

The above procedure can be easily implemented such that it performs the 
minimum number of nullspace computations and updating. The resulting fault 
signature matrix S is obtained by stacking row-wise the matrices S,, i= 1, . . .,A: 
computed at different recursion levels, where k denotes the number of calls of the 
recursive procedure. This number is given by the combinatorial formula 
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/=0 



ruf 
i 



where /max — niin(m/,/7 — r) — 1 and r is the rank of the initial G(l). As it can be 
observed, k depends of the number of basis vectors p — r and the number of faults 
rrif, and, although the number of distinct specifications can be relatively low, still 
k can be a large number. For the already mentioned problem in [31], 
k = I + jfif + mf{mf — l)/2 — 37, but only 18 of the determined specifications are 
distinct. A detailed account of the computational aspects of the procedure FDISPEC 
is done in [27]. 



20.9 Conclusions 

In this paper we presented an overview of computational techniques to determine 
rational nullspace bases of rational or polynomial matrices. Simple proper rational 
bases are the direct correspondents of the polynomial bases and can be computed 
using the proposed numerical algorithms based on minimal cover techniques. 
Having in mind potential applications, we also developed explicit realizations for 
several operation with nullspace bases or with a linear combination of vectors of a 
nullspace basis. The computational techniques presented in this paper have been 
implemented as robust numerical software which is now part of a Descriptor 
System Toolbox for Matlab developed by the authors over the last decade [19]. 

The rational nullspace computation based techniques allow to solve important 
applications as the solution of FDP or FDIP. A main feature of rational nullspace 
based techniques is a full flexibility in addressing different aspects of these 
problems, like computing least order detectors, checking existence conditions, 
computing the achievable fault signature, or employing updating techniques to 
design a bank of detectors to solve the FDIP. The underlying computations 
extensively use orthogonal similarity transformations to perform the important 
computational steps, as for example, to determine a proper nullspace basis or to 
check the existence conditions of a solution. In contrast, methods based on 
polynomial nullspace computations are less flexible, and involve computational 
detours, which are highly questionable from a numerical point of view. 

An interesting result of our studies is that although using simple proper rational 
bases leads to a straightforward solution of the FDP with least order detectors, the 
computation of the simple basis is not actually necessary. Since we need to compute 
only linear combinations of simple basis vectors when solving the FDP with least 
order detector, this computation can be directly performed starting with a minimal 
proper basis which can be obtained using exclusively orthogonal pencil manipu- 
lations. The linear combinations of basis vectors up to a given McMillan degree can 
be computed using numerical algorithms based on minimal cover techniques. This 
aspect is highly relevant for implementing robust and efficient numerical software 
as those available in a recent Fault Detection Toolbox for Matlab [23]. 



458 A. Varga 

Appendix A: Computation of Minimal 
Dynamic Covers 

The computational problem which we solve in this section is the following: given 
a descriptor pair (A - IE, B) with A,£ e W", B e M"""", and B partitioned as 
B = [Bi B2] with Bi e K"^"', 52 e »""""-, determine the matrix F £ W"'-""" such 
that the pair (A + BiF — XE^ Bi ) is maximally uncontrollable (i.e., A + B2F — IE 
has maximal number of uncontrollable eigenvalues). A dual problem to be solved 
in Sect. 20.5 deals with an observable pair (A — IE, C) with nonsingular E and 
with a C matrix partitioned as 



C 



Ci 



In this case, a matrix K is sought such that the pair (A + KC2 — XE, Ci ) is max- 
imally unobservable. For convenience and in agreement with the assumptions of 
the problem to be solved in Sect. 20.5, we will describe a computational method 
which is suitable for a controllable pair (A — IE, B) with nonsingular E. However, 
such an algorithm can be immediately applied to solve the above dual problem by 
applying it to the controllable pair (A^ — XE^ , C^) to determine K^. 

The problem to determine F which makes the pair (A + BiF — X.E,Bi) maxi- 
mally uncontrollable is equivalent [30] to compute a subspace V of least possible 
dimension satisfying 

(A + B2F)V C V, span(Bi ) C V, (20.44) 

where A = £"'A, Z?i = E^^Bi, and B2 = £"'Z?2- This subspace is the least order 
(A, B2) -invariant subspace which contains span(Bi) [30]. The condition (20.44) 
can be rewritten as 

AV C V + span(52), span(fii) C V, (20.45) 

which is the condition defining the subspace V as a Type I dynamic cover [8]. 

In this appendix we describe a computational method for determining minimal 
dynamic covers, which relies on the reduction of the descriptor pair (A — X.E, 
[Bi,B2]) to a particular condensed form, for which the solution of the problem 
(i.e., the choice of appropriate F) is simple. This reduction is performed in two 
stages. The first stage is an orthogonal reduction which represents a particular 
instance of the descriptor controllability staircase procedure of [15] applied to the 
descriptor pair {A — XE,[Bi,B2])- This procedure can be interpreted as a gen- 
eralized orthogonal variant of the basis selection approach underlying the 
determination of Type I minimal covers in [8]. In the second stage, additional 
zero blocks are generated in the reduced matrices using non-orthogonal trans- 
formations. With additional blocks zeroed via a specially chosen F, the least 
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order (A, B2) -invariant subspace containing span(Bi) can be identified as 
the linear span of the leading columns of the resulting right transformation 
matrix. In what follows we present in detail these two stages as well as the 
determination of F. 



Stage I: Special Controllability Staircase Algorithm 

0. Compute an orthogonal matrix Q such that Q^E is upper triangular; compute 
A ^ Q'A, E ^ Q^E, Bi ^ Q^Bi, B2 ^ Q^B2. 

1. Set ;• = 1, r = 0, yt = 2, v<°^ = mi , vf = m2, A"" = A, £("' = £, Bf = Bi , 



B 



(0) 



Bo.Z 



2. Compute an orthogonal matrix Wi such that 



B 



0-i)|rO-i) 



i?i 








,(J-1) 



Ak,k-2 



„(J-1) 






•^1 "^2 

with Ai_u_3 and A^^k-i full row rank matrices; compute an orthogonal matrix 
U\ such that 'W\e'^'^^^U\ is upper triangular. 

3. Compute and partition 



„(/■) 







'Ak-i^k-\ 


Ak-i.,k 


Ak-i,k+i 


WlA'J- 


"f/i := 


Ak.k-i 


Ak.k 


Ak.k+\ 






B'i^ 


I^ 


AW 






v« 


v« 


P 






Ek-i,k-i 


Ek-i.k 


Ek-\,k+\ 


wIe'^- 


"f/i :- 





Ek.k 


Ek,k+i 












Eii) 






v« 


vf 


P 



4. For (• == 1, 



, fe — 2 compute and partition 



U,k-\ 



Ui := [A;,A._i Ai,k Ai,k+{\ 



Eik-xUi 



[Ei,k-\ 



Ei,k 
„W 



P 

Ei,k+i] 
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5. e ^ Gdiag(/„ Wi), Z ^ Zdiag(/„ Ui) 



..O') 



6. If v'i'^ = then i^j- I and Exit. 



,« 



„o-) 



; if p = then I —j and Exit; 



else, j ^ j + 1, A: ^ A: + 2, and go to Step 2. 

At the end of this algorithm A ~ AE := e^(A - AE)Z, B := Q^B, £ is upper 
triangular, and the pair {A,B) is in a special staircase form. For example, for £ = 3 
and r<n, [B A] and E have similarly block partitioned forms 



Ai_i Aifl An Ai2 Ai3 Am A15 Aie An 

O A2fi A21 A22 A23 A24 A25 A26 A27 

O O A31 A32 A33 A34 A35 A36 A37 

B\A = O O O A4,2 Ais A44 A45 Ai6 A47 

O O O O A53 A54 A55 A56 A57 

00000 ^64 ^65 ^66 ^67 

O O O O O O O A76 A77 



Ell E12 
O E22 
E 

O O 

o o 



En 


E18 


E26 


E27 


Eee 


Eq7 





E77_ 



In the special staircase form [B A], A2/-1.2/-3 G K'' '^'i and A2J.2J-2 G 
Ijv, XV, ^j.g j7|jjj ^^^ rank, matrices fory — !,...,£. The trailing row blocks of 
[B A] and E are empty if r — n. In the case when r< n, the trailing diagonal 
blocks A2e+i,2i+uE2e+i.2e+i e m("-'')>^ ("-'■), and the pair {A2e+u2e+i - ^E2e+iai+U 
A2t+i,2e) is controllable. 

In the second reduction stage we use non-orthogonal upper triangular left and 
right transformation matrices W and U, respectively, to annihilate the minimum 
number of blocks in A and E which allows to solve the minimum cover problem. 
Assume W and U have block structures identical to E. The following procedure 
exploits the full rank of submatrices A2/,2/-2 and £2j-i.2/-i to introduce zero blocks 
in the block row 2/ of A and block column 2^ — 1 of £, respectively. 
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Set W = /„, U = /„. 
{or k = ej-l,...,2 

Comment. Annihilate blocks A2k,2j-i, ioi j = k,k + 1, . . . ,i. 

tor j = k,k + !,...,£ 

Compute U2k-2,2]-l such that A2k,2k-2U2k-2,2j-l + Mkaj- 

Aiflj-i ^- Ai^2j-i + Ai^2k-2U2k-2,2j-i, i = 1,2, ... ,2k . 



E, 



J,2j-1 



E, 



«,2j 



-1 + E,, 



i,2k-2U2k-2,2j-l, 



1,2, 



,2k 



end 

Comment. Annihilate blocks E2k-2,2j-i, ioi j = k,k + 1, . . . ,i. 
for j = k,k + !,...,£ 

Compute W2k-2,2j-l such that W2k-2,2j-lE2j-l,2]-l+E2k-2,2j-l=Q- 
A2k-2,t <— A2k-2,t + W2k-2,2j-lA2j-l,t, i = 2j - 2,2j - 1, . . . ,21 . 
E2k-2,z ^ E2k-2,i + W2k-2,23-lE2j-l,z, l = 2j, 2j + 1, . . . ,21 . 



end 



end 



For the considered example, this algorithm introduces the following zero 
blocks: Ags, £45, A43, A45, £23, £^25 (in this order). 

Let A := WAU, E := WEU, and B = [Bi 82] := WB be the system matrices 
resulted at the end of Stage II. Define also the feedback matrix F £ W"^^" par- 
titioned column-wise compatibly with A 

F = [FiOFi ••■ OF2,-iO] 

where F2J-1 G K"'^'"'!' are such that A2.0-F2J-1 + A2,2,-i = for; = 1, . . ., /. 
For the considered example, we achieved with the above choice of F that 



All A12 Ai3 Ai4 Ai5 A16 Ai7 

O A22 O A24 O A26 A27 

A31 A32 A33 A34 A35 A36 A37 

O A42 O A44 O A46 A47 

O O A53 A54 A55 A56 A57 

O O O Am O A66 A67 

O O O O O A-K, An-i 



BiF = 
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En Ei4 Eis Ei(, En 

O E24 O E26 En 

E33 Ei4 £35 £36 £37 

O -E44 O £46 £47 

O O £55 £56 £57 

O O O £66 £67 

O O O O E^^ 

where the elements with bars have been modified after Stage I. 
Consider now the permutation matrix defined by 



£11 


En 





£22 

































'A- 



















' 








'^\- 








































/^;. 











u- 




























K^^ 








































K^ 

























In-r. 



If we define L = PWQ^, V = ZUP^ and F = FV ' , then overall we achieved 
that 



L{A + B2F -XE)V 



Ai - XEi 
O 



Ao - XEn 



L[Bi\B2] 



O 



* 
B2 



where, by construction, the pairs {A\—)dE\,B\) and (A2 — l£2,B2) are in 
controllable staircase form. Thus, by the above choice of £, we made «2 '■— X]i=i ^'2 



of then eigenvalues of the A + B2F — /!,£ uncontrollable via Z?i. It is straightforward 

to show that the matrix Vi formed from the first rii — ^^ " ' 
satisfies 



J2i=i ^ I columns of V, 



AV, 



ViE-'Ai -B2FVU Bi = V^EV'B 



Thus, according to (20.45), V:=span(Vi) is a dynamic cover of Type I of 
dimension «]. It can be shown using the results of [8] that the resulting Type I 
dynamic cover V has minimum dimension. 
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For the considered example, we obtained the controllable staircase forms 



[Bi\Ai-XEi] 



^1,-1 
O 
O 



Au—XEu A13 — XE13 A15 — XE15 

^31 ^33^'^-E'33 ^35^''^-E'35 

O A53 355-AS55 



B2A2-XE2 



^2,0 

o 
o 
o 



A22 — XE22 A2A — XE2A ^26 ^ '^-£'26 ^27 ^ XE27 

A42 ^44 — A-C/44 A^^Q — Ail/46 A47 — A-C/47 

O Aq4 Aqq — XEqq Aqj — XEq-j 

O O Aje Aj7 - XE-n 



The Stage I reduction of system matrices to the special controllability form can 
be performed by using exclusively orthogonal similarity transformations. It can be 
shown that the computed condensed matrices A, £, and B are exact for matrices 
which are nearby to the original matrices A, E, and B, respectively. Thus this part 
of the reduction is numerically backward stable. When implementing the algo- 
rithm, the row compressions are usually performed using rank revealing QR- 
factorizations with column pivoting. 

To achieve an 0{n^) computational complexity in Stage I reduction, it is 
essential to perform the row compressions simultaneously with maintaining the 
upper triangular shape of E during reductions. The basic computational technique, 
described in details in [16], consists in employing elementary Givens transfor- 
mations from left to introduce zero elements in the rows of B, while applying from 
right appropriate Givens transformations to annihilate the generated nonzero 
subdiagonal elements in E. By performing the rank revealing QR-decomposition 
in this way (involving also column permutations), we can show that the overall 
worst-case computational complexity of the special staircase algorithm is 0{rv'). 
Note that for solving the problem in Sect. 20.5, the accumulation of Z is not even 
necessary, since all right transformations can be directly applied to a third matrix 
(e.g., a system output matrix C). 

The computations at Stage II reduction to determine a basis for the minimal 
dynamic cover and the computation of the feedback matrix F involve the solution 
of many, generally overdetermined, linear equations. For the computation of the 
basis for V it is important to estimate the condition numbers of the overall 
transformation matrices. This can be done by computing ||V||^ = \\U\\p and 
l^llf ~ ll^llf ^s estimations of the corresponding condition numbers. If these 
norms are relatively small (e.g., < 10,000) then practically there is no danger for a 
significant loss of accuracy due to nonorthogonal reductions. On contrary, large 
values of these norms provide a clear hint of potential accuracy losses. In practice, 
it suffices only to look at the largest magnitudes of elements of W and U used at 
Stage II to obtain equivalent information. For the computation of F, condition 
numbers for solving the underlying equations can be also easily estimated. A large 
norm of F is an indication of possible accuracy losses. For the Stage II reduction, a 
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simple operation count is possible by assuming all blocks 1x1 and this indicates a 
computational complexity of 0(rP). 
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Chapter 21 

Optimal Control of Switched System 

with Time Delay Detection of Switching 

Signal 



C. Z. Wu, K. L. Teo and R. Volker 



Abstract This paper deals with optimal control problems governed by switched 
systems with time delay detection of switching signal. We consider the switching 
sequence as well as the switching instants as decision variables. We present a two- 
level optimization method to solve it. In the first level, we fix the switching 
sequence, and introduce a time scaling transformation such that the switching 
instants are mapped into pre-assigned fixed knot points. Then, the transformed 
problem becomes a standard optimal parameter selection problem, and hence can 
be solved by many optimal control techniques and the corresponding optimal 
control software packages, such as MISER. In the second level, we consider the 
switching sequence as decision variables. We introduce a discrete filled function 
method to search for a global optimal switching sequence. Finally, a numerical 
example is presented to illustrate the efficiency of our method. 



21.1 Introduction 

A switched system is a special hybrid systems. It consists of several subsystems 
and a switching law for assigning the active subsystem at each time instant. Many 
real-world processes, such as chemical processes, automotive systems, and man- 
ufacturing processes, can be modeled as such systems. 
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There is a considerable interest among researchers in their study of optimal 
control problems of switched systems. See, for example, [1-6]. In [5], a survey of 
several interesting optimal control problems and methods for switched systems are 
reported. Among them, a particularly important approach is the one based on the 
parametrization of switching instants in [6] where the switching instants are 
parameterized by some parameters. Then, the gradient formula of the cost func- 
tional with respect to these parameters is derived. Thus, the original problem can 
be solved by any efficient gradient-based optimization methods. This method is 
similar to the transformation developed in [7]. It is very effective for computing 
the optimal switching instants. However, the switching sequence is assumed fixed 
in [4]. In [6], an optimal control problem governed by a bi-modal switched system 
with variable switching sequence is considered. It is then shown that this switched 
system is embedded into a larger family of systems and the set of trajectories of the 
switched system is dense in the set of those of the embedded system. Then, 
optimal control problem governed by the larger system is considered instead of the 
original problem. Based on the relationship between the two optimal control 
problems, it is shown that if the latter optimal control problem admits a bang-bang 
solution, then this solution is an optimal solution of the original problem. Other- 
wise, a suboptimal solution can be obtained via the Chattering Lemma. This 
method is effective only for the bi-modal case. The extension to multi-modal cases 
requires further investigation. 

For all the optimal control problems considered above, they are based on the 
assumption that the detection of the switching signal is instantaneous. However, 
in practice, we may not be able to detect the change of the switching signal 
instantly. We often realize such a change after a time period. In [8], the sta- 
bilization of such systems is considered. In this paper, we consider an optimal 
control problem governed by such a system. In our problem, the optimal control 
is considered to be of feedback form; and both the switching instants and the 
switching sequence are considered as decision variables. This optimal control 
problem is to be solved as a two-level optimization problem. In the first level, 
we fix the switching sequence and introduce a time scaling transformation to 
map the switching instants into pre-assigned fixed knot points. Then, it can be 
solved by some optimal control software packages, such as MISER. In the 
second level, we introduce a discrete filled function method to search for an 
optimal switching sequence. 

The paper is organized as follows. We formulate the problem in Sect. 21.2 In 
Sect. 21.3, we decompose the original problem into a two level optimization 
problem. Then, a time scaling transformation is introduced to map the switching 
instants into pre-fixed knot points. The transformed problem can be solved by 
many optimal control software packages, such as MISER. In Sect. 21.4, we 
introduce a discrete filled function method to determine the optimal switching 
sequence. In Sect. 21.5, an illustrative example is presented. Section 21.6 con- 
cludes the paper. 
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21.2 Problem Formulation 

Consider a switched system given by 

k{t)=A^,^x{t)+Bs(r)u{t),te [0,T] (21.1) 

with initial condition 

x(0)=xo, (21.2) 

where x e M." is the state vector, u e M'" is the control vector, the switching signal 
S{t) G {1,2,.. ., A^} is a right continuous function and its discontinuity points are 

Ti, T2, . . ., Xm-\ such that 

= To<Ti < • • • <tm-i <tm = ?"■ (21-3) 

while for each i G {1,2, . . .,N},Ai e M"''" and B; e R"''"\y{t) is the detection 
function of 5{t), and t > is the time-delay, which is the time required to detect 
which subsystem is active. 

The control m is of a piecewise constant feedback form, i.e., 

M(?)=^,,,(,)x(r), (21.4) 

where 

y{t) = 5{t-z),te[0,T], (21.5) 

Ky^r)^K, fe[0,T], (21.6) 

and^; eR'"'''",iG {l,2,...,N} , are to be designed. 
To simplify the notation, let 

S{t) =4, if ?e [Tn,Ti], k^ 1,...,M. (21.7) 

For a given switching signal d{t) and the control expressed as a piecewise constant 
feedback form given by (21.4), the evolution of the dynamical system (21.1) can 
be described as in Fig. 21.1. 

Let T = [ti, . . ., XM-iV, K = [KuK2, . . .,Kn] € R""^"^, / = [iuii, ■ ■ -, im]- The 
optimal control problem dealt with in this paper can be stated as follows. 

Problem 1 Consider the dynamical system (21.1) with the initial condition and 
the piecewise constant control given by (21.4)-(21.6). Find a {z,K,I) such that 

T 

J{x,K,I)^x^{T)Px{T)+ I x^ {t)Qx{t) + u^ {T)Ru{T)dt (21.8) 
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Fig. 21.1 Evolution of the 
system (21.1) 
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is minimized subject to the constraint (21.3), where P, Q, and 7? are matrices with 
appropriate dimensions. 

Note that the piecewise constant control given by (21.4) is determined by the 
control switching points ti + t, T2 + t, . . ., tm-i + f , where ti , T2, . . ., tm-i , are 
sub-system switching time points which are decision variables to be optimized 
over. Problem 1 cannot be solved directly using existing numerical optimal control 
techniques. However, by using a time scaling transformation developed in [7, 9, 
10], we shall show in the next section that Problem 1 is, in fact, equivalent to an 
optimal parameter selection problem, where the varying switching points are being 
mapped into pre-assigned knot points in a new time scale. 

To proceed further, we assume that the following condition is satisfied: 

Assumption 1 All the switching durations are larger than the delay time t, i.e.. 



T; -T,-i > T, i= l,...,M. 



(21.9) 



21.3 Problem Reformulation 

In this section, we will show that Problem 1 can be transformed into a standard 
parameter selection problem. 
We re-write (21.1) as follows: 



x(f) = ^Ki,j{Ajx{t) +Bju{t)), if t € [t/_i,t,], / = 1, . . .,M, (21.10) 



where K/j, ; = 1, . . .,M; 7 = 1, . . ., A'^, are logical integer variables. Since only one 
sub-system is active at each time point, Kjj,i = I, . . .,M; j = I,. . .,N, are 
required to satisfy 
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^K,j = 1, K,je {0, 1}, i=l,...,M; j=l,...,N. 



(21.11) 



j=i 



Let K — [kij, . . ., Ky.AT, . . ., kmjv] ■ We introduce the following time scaling 
transformation: 



dt 
d~s 



= £ ^r/ii-i/2J]i') + Y. 2^?f(M+i/2l(*)' * e [0,M], (21.12) 



1=1 1=0 

where 

^,- = 2(t;-t), (•= 1,...,M, 
);;/(s) is the indicator characteristic function defined by 

^'^^^ = I o! else. 
Let § = [^1,^2, ■ ■ •, ?Af]^- By (21.3) and (21.9), we have 



(21.13) 



J2i^i + t) = r, C/ > 0, for i = 1, . . .,M. 



(21.14) 



After this time scaling transformation, the sub-system switching points 
T 1 , T2 , . . . , tm- 1 , have been mapped into 1,2, . . .,M — 1, and the control switching 
points Ti + T, T2 + T, . . ., tm-1 + f, have been mapped into 1/2, 3/2, . . .,M — 1/2. 
Then, system (21.1) is transformed into 



that is. 



+ 2tX(;-i,/-i/2](^)), ifs€[i-l,i],i^l,...,M, (21.15) 



' 2zj:'^,Kij{Aj+Bjk)x{s), if. G [0,1/2], 

i(.) J ^.S^.'^'./(^.. + ^A^;,.„>(^)' if ^^(1/2,1], (,,^,) 

. . ., 

-^M Ejli '^M.j Uj + BjK^. K-„,) ^(^) ' if ^ e (W - 1 /2, M] . 

with initial condition 

x(p)=xo, (21.17) 

and the cost functional (21.8) is transformed into 
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M 

J{^,K, k) = x^{M)Px{M) + / {x^{s)Qx{s) + u^{s)Ru{s)) 



(M M-l \ 

JZ^'^a-Uui') + E2tZ(/,/+i/2|W d., (21.18) 
i=i ;=o / 

that is 

1/2 

J{^,K,k) ^ x^{M)Px{M) + / 2TJt;T(i')(g + ^T/?^)x(i')d.j 



A:+l/2 , . 

(21.19) 

where for the simplicity of the notation, we write x{s) — x{t{s)) , 8{s) = 
5{t{s)),y{s) = y{t{s)). Now the transformed problem may be formally stated as: 

Problem 2 Subject to system (21.16) with initial condition (21.17), find a triple 
{^,K, k) such that the cost functional (21.19) is minimized subject to the constraint 
(21.14) and (21.11). 

For easy reference, the equivalence between Problem 1 and Problem 2 is stated 
in the following as a theorem. 

Theorem 3.1 Problem 1 is equivalent to Problem 2 in the sense that if 
{z* ,K* ,K*) is an optimal solution of Problem 1, then {^* ,K* ,k*) is an optimal 
solution of Problem 2, where the relation between x* and ^* is determined by 
(21.13), and vice versus. 

Note that 

min J(S,K,k) = minmin/(ij,/ir, k). 

Thus, Problem 2 can be posed as a two-level optimization problem. The first 
level is 

/(k) = min 7(§,^,k) subject to (21.14), (21.20) 

and the second level is 

min /(k) subject to (21.11). (21.21) 
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For easy reference, let the optimization problem (21.20) be referred to as 
Problem 3 and the optimization problem (21.21) be referred to as Problem 4. For 
each K, Problem 3 is a standard optimal control problem and can be solved by 
gradient-based optimization methods. The required gradient formulas can be 
obtained from Theorem 5.2.1 in [11]. Thus, for each k, Problem 3 can be solved 
by many available control software packages, such as MISER 3.3 [12]. 

Problem 4 is a discrete optimization problem. We will introduce a discrete 
filled function method to solve it in the next section. 



21.4 Determination of the Switcliing Sequence 

In Sect. 21.3, we have shown that Problem 1 with fixed switching sequence can be 
reformulated as an optimal parameter selection problem and hence is solvable by 
many optimal control software packages, such as MISER 3.3. In this section, we 
will introduce a method to determine its switching sequence. 

Note that the determination of a switching sequence is equivalent to the 
determination of the logical integer variables Kij,i ~ I, . . .,M; j — 1, . . .,A^. It is a 
discrete optimization problem. Here, we will introduce a modified discrete filled 
function method developed in [13-15]. 

Let e, _j be an element of M™^ with the ith component 1, theyth component —1, 
and the remaining components 0. Similarly, let e_,j be an element of R'*^ with the 
ith component —1, theyth component 1, and the remaining components 0. 

Let D — {e,-,_;,e_,-.j, i,j = 1, . . .^NM, i 7^7} and 11 be the set of k which sat- 
isfies (21.11). 

Definition 4.1 For any k e n,N(ic) = {k + d: d £ D} (III denotes the neigh- 
borhood of the integer point k. 

Definition 4.2 A point k* G 11 is said to be a discrete local minimizer of Prob- 
lem 4 if /(k*) < J{k) for any k e N{k*) D 11. Furthermore, if /(k*) < J{k) for any 
K e N{k*) n n, then K* is said to be a strict discrete local minimizer. 

Definition 4.3 A point K*e 11 is said to be a discrete global minimizer if 
J{k*) <J{k) holds for any k e 11. 

Definition 4.4 A sequence {k'}. , is called a discrete path in 11 between K^'*e 11 
and K^-* G n if the following conditions are satisfied: 

1 . For any / = 1 , . . . , A:, k' e 11 

2. For any / j^ j, k' 7^ k-' 

3. k' ~ K^* ,1^ = K^'* and 

4. ||k''+-' -k''|| =2,/= l,...,yfc- 1. 

We note that 11 is a discrete pathwise connected set. That is, for every two 
different points k^ , k^ , we can find a path from k' to k^ in 11. Clearly, 11 is bounded. 



474 C. Z. Wu et al. 

Algorithm 4.1 (Local search) 

1. Choose a Kfl e n; 

2. If Kfl is a local minimizer, then stop. Otherwise, we search the neighborhood of 
Ko and obtain a k <E N(ko) fl 11 such that J{k) <J{ko). 

3. Let Ko = K, go to Step 2. 

Definition 4.5 p{k,k*) is called a discrete filled function of J{k) at a discrete 
local minimizer k* if it satisfies the following properties: 

1. K* is a strict discrete local maximizer of p{k, k*) over 11; 

2. p{k, k*) has no discrete local minimizers in the region 

Si = {k : J{k) >J{k*),k e U/k*}; 

3. If K* is not a discrete global minimizer of J{k), then p{k,k*) has a discrete 
minimizer in the region 

^2 = {K:J{K)<J{K*),Ken}. 



Now, we give a discrete filled function which is introduced in [15]. 

FiK,K\q,r)^^-^ -(p^{max{JiK) - J{k*) + r, 0}), (21.22) 

where 

.0 (A - /^^P(~?A)> if '7^0, 
^"^ ' \ 0, if f = 0, 

and r satisfies 

0<r<^ max (J{ic*) - J{k*J), (21.23) 

K",K* £/,(/•) ,J(k*) > J(k*) 

L{p) denotes the set of discrete local minimizers of J{k). Next, we will show 
that for proper choice of 17, r, F{k, K*,q,r) is a discrete filled function. 

Theorem 4.1 Suppose that k* is a discrete local minimizer of J{k) . Then, for 
proper choices of q > and r > 0, k* is a discrete local maximizer of 

F{K,K*,q,r). 

Proof The proof is similar to that given for Theorem 3.1 [15]. 
Let 

rP= max {/(k/)-/(k2)}. 

Then, we have the following lemma. 
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Lemma 4.1 Suppose that k* is a local minimizer of J{k). For any Ki,K2 G 11, let 
the following conditions be satisfied: 

1. 7(ki)>/(k*) and7(iC2)>7(K*), 

2. ||k2-k*|| > ||ki ~k*\\. 

Then, when r>0 and q>0 are satisfactory small, F{K2,K*,q,r)<F{Ki, 

K*,q,r). 

Proof The proof is similar to that given for Theorem 3.4 [15]. 

Theorem 4.2 Suppose that k* is a local minimizer of J[k). Then, F(k,k* ,q,r) 
has no discrete local minimizers in the region 

Si = {k\j{k) > 7(k*), k e n/K*} 

if r > and q > are chosen appropriately small. 

Proof Suppose the conclusion was false. Then, there exists ak* € Si such that for 
all K e n n A^(k*), we have J{k) >J{k*) >J{k*). We claim that there exists a 
d eD such that Hie* - k* - rf|| > ||k:* - k*|| and k* - d e nnA^(ic*). To estab- 
lish this claim, we note that \\k* — k*\\~= J2i=\ i*^! ~ '^i) ■ There are two cases to 
be considered, (i) there exists some i,j such that k* — k^ > and k* — k* < 0. (ii) 
k^-K{>0 or k* ~ Ki <0 for all \<i<NM. For case (ii), let / = 

maxi<i<A,M|Ki - KjI and y = miuj <i.</vM|Ki - k1|- Now choose d* = ei^^j. 
Since k* — rf £ Af(ie*), we have J{k* — d)>J{K*). By Lemma 4.1, we have 
F{k* — d, K*,q,r) <F{k*, k* ,q,r). This is a contradiction as k* is a discrete local 
minimizer.Thus, the conclusion of the theorem follows. 

Theorem 4.3 Suppose that k* is a local but not a global minimizer of J{k). Then, 
F{k,k* ,q,r) has a discrete minimizer in the region 

Si = {K|7(K)</(ic*),Ken}. 

Proof Since k* is a local but not a global minimizer of J{k), there exists a point 
ic* £ n such that 7(k*) + r</(K*), where r is chosen according to (21.23). 
Hence, F{k* ,K*,q,r) — 0. However, F{K,K*,q,r) >0 for all k e 11. This implies 
that k* is a discrete minimizer and satisfies J{k*) <J{k*). 

By Theorem 4.1, Theorem 4.2 and Theorem 4.3, the function F{k, K*,q,r) is, 
indeed, a discrete filled function if r > and ^ > are chosen appropriately small. 
Now we present a numerical algorithm to search for a global minimizer of J{k) 
over n based on the theoretical established above. 

Algorithm 4.2 

1. Take an initial point Ky G 11 and initialize the tolerance s. Let 
r = 0.1, 9 = 0.01. 
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2. From kj, use Algorithm 4.1 to find a local minimizer Kj of 7(k) over H. 

3. If r<£, stop. Otherwise, construct a discrete filled function 

F{K,K*,q,r) =— -n ^ip (max{/(K)-7(Kj) + r, O}). 

^+ ||k — kJII ^ 

Use Algorithm 4.1 to find its local minimizer k*. In the process of minimizing 
the discrete filled function F(^K,Kj,q,rj, if we find an iterative point k such 
that J{k) <J(kj), then let k/ = k* and go to Step 1. 

4. If J{k*)<J(kj^, let Ky = K* and go to Step 1. Otherwise, set r = r/lO:q = 
q/lO and go to Step 3. 



21.5 Numerical Example 

In this section, we will apply the method developed in Sect. 21.3 and Sect. 21.4 to 
our test problems. 

Example 5.1 The three sub-systems are given by 

, f il = Xl + Ml f il = Xl + Ml f il = Xl + X2 + Ml 

' 1^ X2 = 2xi + 3X2 + M2 ' [ X2 = Xi + X2 + M2 ' \ X2 = 2xi — X2 + M2 

the switching detection delay is 0.01. Let 8{t) be the switching function, y{t) be 
the detection of the switching signal, i.e., y{t) — 8{t — 0.01). Let the maximum 
switching times be 3. Our objective is to find a feedback law u{t) = [mi,M2] = 
Ky(,)x{t), where Ky^,) G M, such that the cost functional 

7(<5,/r,,) = (xi(l)f+(x2(l)-1.0f 
is minimized subject to the following condition: 

-3<K..<3. 



For a given switching sequence k = (/i, !2, j's, u), we suppose that the switching 
points are Ti, T2, 13. First, we use the time scaling transformation (21.12) to map the 
state switching points Ti, 12,13, and the control switching points 0.01, ti +0.01, 
T2 + 0.01,T3 +0.01, into 1,2,3, and 1/2,1 + 1/2,2+ 1/2,3 + 1/2, respectively. 
Then, let the initial switching sequence be {1,0,0,0,1,0,0, 0,1,1,0,0}. We 
incorporate MISER 3.3 as a sub-program and set the lower bound of the switching 
parameters to be 0.05. Using Algorithm 4.1 to search its local optimal switching 
sequence, the obtained local optimal switching is {0,0,1,1,0,0,1,0,0,1,0,0}, 
the corresponding parameters of switching instants and the constant of feedback 
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Fig. 21.2 The trajectory of 
the optimal state 
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are 0.38354, 0.05, 0.47646, 0.05, -2.603, -2.5069, -2.514, -2.4682, respectively. 
The corresponding optimal cost is 0.1727. Thus, the 3 state switching instants are 
0.39354,0.45354,0.94, the control switching instants are 0.01,0.40454,0.46354, 
0.95. Then, we set the tolerance e = 10"^ and use discrete filled function to find its 
local minimizer. However, we cannot find any local minimizer of the corresponding 
discrete filled function. The program stopped because of r = s and the last obtained 
switching sequence is {0, 1,0,0,0, 1,0,0, 1,0, 1,0}, the corresponding value of the 
discrete filled function and the cost functional are 0.12497, 0.684409, respectively. 
Thus, the local minimizer {1,0,0,0, 1,0,0,0, 1, 1,0,0} is also a global optimal 
switching sequence. The optimal state is depicted in Fig. 21.2. 



21.6 Summary 

In this paper, the optimal control problem governed by switched system with time 
delay detection is considered. Considering the switching sequence as well as the 
switching instants as decision variables, we developed a two-level optimization 
method to solve it. In the first level, the switching sequence is fixed, thus it can be 
transformed into a parameter selection problem. For this problem, we use MISER 
3.3 to solve it. For the second level, the switching sequence is considered as 
decision variables. For this discrete optimization problem, we introduced the filled 
function method to solve it. At last, a numerical example is presented to show that 
the efficiency of our method. 
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